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Abstract 

This  paper  explores  learning  models  in  the  spirit  of  the  method  of 
fictitious  play.   We  extend  the  previous  literature  by  generalizing  the 
classes  of  behavior  rules,  and  by  considering  games  with  more  than  two 
players.   Most  importantly,  we  reformulate  the  study  of  convergence  to  mixed 
strategy  equilibrium,  using  Harsanyi's  notion  of  perturbed  games. 
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1.  Introduction 

Nash  equilibrium  describes  a  situation  in  which  players  have  identical  and 
exactly  correct  beliefs  about  the  strategies  each  player  will  choose.  How  and  when 
might  the  players  come  to  have  correct  beliefs,  or  at  least  beliefs  that  are  close 
enough  to  being  correct  that  the  outcome  corresponds  to  a  Nash  equilibrium? 
One  explanation  is  that  the  players  play  the  game  over  and  over,  and  that  their 
beliefs  come  to  be  correct  as  the  result  of  learning  from  past  play.  This  explanation 
has  been  explored  at  some  length  in  the  recent  literature,  in  models  that  take  a 
number  of  different  forms  and  that  stress  different  aspects  of  the  problem. ] 

This  paper  explores  learning  models  that  are  in  the  spirit  of  the  model  or 
method  of  fictitious  play  (Brown,  1951;  Robinson,  1951)  in  which  players  choose 
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1 


their  strategies  to  maximize  their  current  period's  expected  payoff  on  the  as- 
sumption that  their  opponents  will  play  each  strategy  with  probability  equal  to 
its  historical  frequency.  We  extend  the  previous  literature  in  three  ways: 

(1)  We  provide  some  minor  extensions  to  the  basic  model  of  fictitious  play,  by 
generalizing  the  classes  of  rules  by  which  players  form  their  beliefs  and  use  them 
to  choose  their  actions. 

(2)  We  study  models  in  the  spirit  of  fictitiour  play  for  games  with  more  than  two 
players. 

(3)  Most  importantly,  we  reformulate  the  study  of  convergence  to  mixed-strategy 
equilibria.  We  argue  that  the  notion  of  convergence  used  previously  in  the  liter- 
ature, that  the  empirical  marginal  distributions  converge,  is  not  an  approproate 
notion  of  what  it  means  to  play  a  mixed-strategy  profile,  and  we  suggest  and 
analyze  the  stronger  criterion  of  the  convergence  of  intended  behavior.  We  show 
that  all  Nash  equilibria  and  only  Nash  equilibria  are  possible  limit  points  under 
this  mode  of  convergence.  Finally,  we  investigate  the  global  stability  of  mixed 
equilibria  in  the  setting  of  Harsanyi's  (1973)  purification  theorem. 

Section  2  gives  a  general  formulation  of  learning  in  a  strategic-form  game. 
This  formulation,  and  our  subsequent  analysis,  supposes  that  the  same  players 
play  each  other  repeatedly  (as  opposed  to  a  model  with  a  large  number  of  player 
l's,  player  2's,  etc.)  and  that  in  each  round  of  play,  players  observe  the  (pure) 
strategies  chosen  by  their  rivals. 

Section  3  reviews  the  model  of  fictitious  play  for  two  players.  We  separate 
the  questions  addressed  by  fictitious  play  (and  models  of  learning  in  general) 
into  two  groups.  First,  if  play  "settles  down"  or  converges  in  some  appropriate 
sense,  what  are  the  possible  limit  points?  Second,  is  play  guaranteed  to  converge? 

With  regard  to  the  first  question,  recall  that  there  are  two  modes  of  conver- 
gence in  the  literature  on  fictitious  play.  In  *he  first,  there  is  a  finite  time  T  such 


that  a  single  strategy  profile  is  played  in  every  period  from  T  on;  it  is  easy  to 
see  that  any  such  profile  must  be  a  Nash  equilibrium.  In  the  second  mode  of 
convergence,  play  cycles  among  different  strategy  profiles  in  a  way  such  that  the 
empirical  frequencies  of  each  player's  choices  converge  to  some  (mixed)  strat- 
egy. The  corresponding  strategy  profile  is  also  a  Nash  equilibrium;  this  is  the 
traditional  sense  in  which  fictitious  play  is  said  to  converge  to  mixed-strategy 
equilibria.  Section  3  briefly  reviews  results  from  the  literature  that  use  these  two 
convergence  notions. 

In  Section  4,  we  generalize  fictitious  play  by  considering  more  general  as- 
sumptions about  the  ways  in  which  players  construct  their  assessments  and  then 
choose  their  immediate  actions.  We  will  assume  throughout  that  players'  choice 
of  actions  are  asymptotically  myopic;  i.e.,  in  the  long  run,  players  choose  in  a -way 
that  maximizes  their  immediate  payoffs.  This  assumption  requires  some  expla- 
nation and  rationale,  which  we  provide.  As  for  players'  assessments,  if  they 
are  adaptive  (following  Milgrom  and  Roberts  [1991]),  then  if  intended  play  con- 
verges to  a  pure-strategy  profile,  the  profile  must  be  a  Nash  equilibrium,  for  any 
number  of  players.  But  more  is  required  if  the  second  form  of  convergence  — 
convergence  of  empirical  frequencies  to  a  mixed-strategy  profile  —  is  to  hav^  only 
Nash  equilibria  as  limit  points.  A  sufficient  condition  is  that  assessment  rules  are 
asymptotically  empirical,  which  means  that  players'  assessments  converge  together 
with  empirical  frequencies.  Moreover,  this  condition  suffices  only  for  two-player 
games. 

Section  5  presents  several  objections  to  convergence  of  empirical  frequencies 
as  an  appropriate  mode  of  convergence  for  learning  to  play  a  mixed-strategy  pn^- 
file.  In  summary,  these  objections  are:  (1)  Although  assessments  are  converging 
(if  they  are  asymptotically  empirical),  the  strategies  that  are  chosen  are  not.  (2) 
In  examples,  correlations  will  be  observed  (over  time)  in  the  actions  of  players 
who  choose  their  actions  independently.  (3)  Because  of  (2),  convergence  in  this 


sense  for  games  with  more  than  two  players  is  problematic. 

For  these  reasons,  in  Section  6  we  propose  a  stronger  mode  of  convergence, 
namely  convergence  of  (intended)  behavior.  This  raises  some  technical  difficulties: 
When  players  use  mixed  strategies,  the  realized  distribution  of  play  need  not 
equal  the  intended  one,  which  makes  notions  of  convergence  inherently  prob- 
abilistic. These  difficulties  are  attended  to,  and  then  we  show  that  only  Nash 
equilibria  are  possible  limit  points  under  this  mode  of  convergence,  as  long  as 
behavior  is  asymptotically  myopic  and  assessments  are  asymptotically  empirical. 
Moreover,  for  any  game  and  Nash  equilibrium  of  the  game,  there  is  a  model 
of  asymptotically  myopic  behavior  and  asymptotically  empirical  assessments  for 
which  the  equilibrium  is  a  limit  point  of  intended  behavior  with  probability  as 
close  to  one  as  desired. 2  These  results  are  not  limited  to  two-player  games. 

The  problem  with  this  alternative  mode  of  convergence  is  that,  while  con- 
vergence to  mixed  behavior  is  possible,  it  is  hard  to  see  why  it  should  occur. 
The  difficulty  is  the  standard  one  with  mixed  strategies:  If,  based  on  their  as- 
sessments, players  choose  their  actions  to  maximize  precisely  their  expected  pay- 
offs, then  (unless  their  assessments  are  precisely  those  of  the  mixed  equilibrium) 
their  intended  behavior  will  not  converge.  If  players  are  not  restricted  to  precise 
maximization,  then  any  behavior  (that  puts  weight  on  actions  in  the  equilib- 
rium mixture)  will  be  satisfactory.  Something  outside  of  payoff -maximization 
considerations  is  required  to  lead  players  to  the  precise  mixtures  needed  in  the 
equilibrium.  In  our  basic  formulation,  we  see  no  natural  way  of  doing  this. 

As  a  way  around  this  problem,  in  Sections  7  and  8  we  consider  learning  in 
games  in  which  each  player's  payoff  is  subject  to  a  sequence  of  i.i.d.  random 
shocks  that  are  observed  only  by  that  player,  as  suggested  by  Harsanyi's  (1973) 
purification  theorem.    In  this  context,  a  mixture  over  two  strategies  does  not 


To  the  extent  that  some  Nash  equilibria  seem  unreasonable,  such  as  those  where  players  use 
weakly  dominated  strategies,  this  last  result  indicates  that  are  assumptions  are  too  weak.  This  will 
be  discussed  as  well  in  Section  6. 


correspond  to  mixing  by  a  player  who  is  indifferent,  but  rather  to  a  player  who  (in 
each  period)  strictly  prefers  one  of  the  two  strategies,  depending  on  the  (period's) 
realization  of  the  player's  payoff  perturbation.  Yet  from  the  perspective  of  other 
players,  who  do  not  know  the  precise  value  of  this  period's  perturbation  for  the 
player,  the  actions  of  the  first  player  are  random.  We  show  in  Section  7  that 
in  this  context  all  of  our  earlier  results  go  through  without  difficulty.  Then,  in 
Section  8,  we  specialize  to  the  class  of  2  x  2  games  with  a  unique  equilibrium  in 
mixed  strategies,  and  we  show  that  any  learning  process  that  is  close  to  fictitious 
play  (in  a  sense  to  be  made  precise)  will  converge  with  probability  one  to  the 
unique  equilibrium.  Here  we  use  and  adapt  results  from  the  theory  of  stochastic 
approximation  (Arthur  et  al.,  1987;  Kushner  and  Clark,  1978;  and  Lyung  and 
Soderstrom,  1983). 3 

Before  setting  out,  let  us  note  that  the  results  given  here  are  only  a  small  part 
of  the  overall  story.  Among  other  things,  we  are  assuming  that  players  encounter 
the  same  opponents  repeatedly  and  yet  act  myopically,  a  fairly  unsavory  combi- 
nation (see  Section  4),  they  observe  the  full  (stage-game  pure)  strategies  chosen 
by  rivals  in  each  round  of  play,  and  they  play  the  same  game  over  and  over. 
We  hope  to  return  to  each  of  these  three  simplifying  assumptions  in  subsequent 
work. 

2.  Formulation 

Fix  an  /-player,  finite,  strategic- form  game,  hereafter  referred  to  as  the  stage 
game.  The  players  are  indexed  i  =  1,2,...,/,  and  we  let  -2'  denote  the  "other" 
players;  i.e.,  -i  =  {1,2,...  ,1  -  l,i  +  l,...,/}  .*  Let  S{ ,  i  =  1, ...,/,  be  the  finite 
set  of  pure  strategies  (or  actions)  for  player  i ,  let  5  =  S1  x  . . .  x  S1  be  the  set  of 


'  Thus  our  work  is  similar  to  that  of  Marcet  and  Sargent  (1989a,  1989b)  on  learning  rational 
expectations  equilibrium. 

We  use  male  pronouns  for  players  in  general,  and  for  players  numbered  2, 4,  6,  etc.  and  lettered 
j  and  -j  .  We  use  female  pronouns  for  players  numbered  1,3,  etc.,  and  for  players  lettered  :'  and  k. 


pure-strategy  profiles,  and  let  v1  :  S  — ►  7?  give  player  i's  payoffs.  In  the  usual 
fashion,  let  J?  be  the  mixed  strategies  for  player  i ,  let  JT  =  J1  x  . . .  x  X"7  be  the 
set  of  mixed-strategy  profiles,  and  extend  the  domain  of  u'  from  5  to  17.  Also, 
for  each  i  let  S~*  denote  Yljj, :  S3 ' ,  and  let  U~'  denote  the  set  of  probability 
distributions  over  S~' ;  for  s'  £  S'  and  a-1  £  X1-',  let  tA-s'^-')  denote  i  's 
expected  utility  if  she  chooses  pure  strategy  s1  and  her  rivals  act  according  to 
the  (possibly  correlated)  distribution  o~' . 

Imagine  that  these  players  play  the  game  repeatedly,  at  dates  t  =  1,2, 

Imagine  that  after  each  round  of  play,  players  observe  the  actual  actions  chosen 
by  their  opponents;  i.e.,  the  pure  strategy  that  is  chosen  is  observed.  If  a  player 
chooses  his  action  using  a  mixed  strategy,  the  mixing  is  not  observed.  Then  a 
history  of  play  up  to  time  t ,  denoted  (< ,  is  a  string  of  (pure)  strategy  profiles 
0  =  {s-i , . . . ,  st-i) ,  where  st>  €  S  for  t'  =  1, . . . ,  *  -  1 .  The  set  of  all  histories  of 
play  up  to  time  t ,  or  (5)'_1 ,  is  denoted  by  2,  .5  By  convention,  Zx  will  denote 
the  (singleton)  set  consisting  of  the  null  history.  Also,  Z  will  denote  the  set  of 
all  possible  infinite  histories;  i.e.,  Z  =  (S)°° ,  with  typical  element  (  =  (5l552, . . .) . 

The  basic  object  of  this  paper  is  a  model  of  learning  and  behavior,  which  specifies 
how  the  players  behave  and  what  they  believe  as  time  passes.  A  model  of 
learning  and  behavior  consists  formally  of  t  .vo  pieces,  behavior  rules  and  assessment 
rules  for  each  player.  We  take  these  in  turn. 

Behavior  rules 

We  will  denote  by  <f>f  the  behavior  rule  that  playei  ?  uses  in  the  infinitely 
repeated  game.  That  is,  <f>'  =  (<f>{, 6'2,...),  where  <f>\  :  Zt  — ►  El .  The  notation  <f> 
(for  a  profile  (<£\ . . . ,  4>!)  of  behavior  rules  for  the  players),  4>t  (for  the  profile  of 
behavior  rules  at  date  t  as  a  function  of  (t ),  and  <?,((<)  will  all  be  used. 


Insofar  as  possible,  we  follow  the  convention  that  subscripts  refer  to  time  and  superscripts  to 
players.  When  we  write  (■)'  ,  however,  we  mean  the  usual  t  fold  Cartesian  product  of  the  argument 
within  the  parentheses.  (We  will  try  to  avoid  this  as  much  as  possible.) 


Fix  a  profile  of  behavior  rules  q> .  Given  any  t  >  1  and  (,  €  Zt ,  we  can  use  <p 
to  contruct  a  conditional  probability  distribution  (conditional  on  (t )  for  the  rest  of 
the  path  of  play  in  the  usual  fashion:  Ct  and  <t>  give  a  probability  distribution  on 
st  (the  actual  play  at  date  t)  via  (f>t((t)  ■  This  gives  us  conditional  probabilities  on 
ZM  ,  and  with  transition  probabilities  for  sM  given  by  <f>M((M) ,  we  can  extend 
the  conditional  distribution  to  Zt+2 ,  and  so  on.  These  then  give  a  probability 
distribution  over  the  space  of  complete  histories  Z  by  the  Kolmogorov  extension 
theorem.  We  will  write  P(-|(<)  for  this  conditional  probability,  keeping  in  mind 
that  this  is  for  a  fixed  profile  of  behavior  strategies. 

One  part  of  this  construction  must  be  emphasized.  Given  history  (, ,  the 
probability  that  i  plays  s'  at  time  t  is  ^J(Ct)(s') .  When  we  construct  the  con- 
ditional probability  distribution  on  ( ,  we  must  specify  the  joint  probability  that 
1  plays  51 ,  2  plays  s2 ,  and  so  on.  We  insist  that 

Y(st  =  (s[,...,sI)  |c.)    =   ^(G)(^)x...x^(G)(37). 

That  is,  players  randomize  (in  their  behaviors)  independently. 

Assessments 

To  model  the  behavior  rules  of  players,  we  will  employ  some  ancillary  for- 
malisms. Specifically,  we  will  want  to  speak  of  what  each  player  assesses  con- 
cerning the  behavior  of  her  rivals,  at  each  date  t  and  contingent  on  each  possible 
history  £i . 

For  the  analysis  in  this  paper,  it  will  suffice  to  specify  (for  each  player  i ,  time 
t ,  and  partial  history  (, )  what  i  believes  her  rivals  will  do  in  the  round  about  to 
be  played.  Formally,  for  each  t ,  let  /ij  denote  a  function  with  domain  Zt  and 
range  E~x ,  representing  i's  assessment  over  the  possible  pure  strategy  profiles 
that  her  rivals  will  choose  at  date  t ,  as  a  function  of  (t .  Also,  we  use  /il  to 
denote  a  full  system  of  assessments  or  assessment  rule  for  z ;  i.e.,  fil  is  a  sequence 


(/Xj',/4,---)-  Note  well  that  cr"'  e  -T-'  encodes  more  than  z's  marginal  afsess- 
ments  for  her  rivals'  behavior;  i  is  allowed  to  make  an  assessment  concerning 
the  joint  behavior  of  her  rivals  that  admits  correlations  in  their  play. 6  This  may 
at  first  seem  troubling  when  contrasted  with  the  independence  assumption  made 
in  the  earlier  construction  of  the  probability  measures  P .  There  is  no  conflict, 
however.  The  measure  P  reflects  the  objective  probability  measure  that  governs 
the  evolution  of  play,  as  a  function  of  the  behavior  rules  by  the  players.  We  do 
not  allow  players  to  correlate  their  (mixed)  strategies  at  any  date,  hence  P  is 
constructed  with  independence  at  each  date.  On  the  other  hand,  the  fi)(Ct)  rep- 
resent a  player's  subjective  assessment  of  what  her  rivals  are  about  to  do;  unless 
and  until   i  knows  what  behavior  rules  her  rivals  are  using,  correlation  in  her 


For  example,  imagine  a  three-player  game  in  which  player  1  has  a  choice  between  pure  strate- 
gies a  and  b  and  player  2  has  a  choice  between  a'  and  6' .  Imagine  that  player  3  thinks  that 
player  1  either  mixes  between  a  and  b  with  probabilities  3/4  and  1/4  or  with  probabilities  1/4 
and  3/4 ,  with  the  same  mixing  probabilities  used  at  *nch  date,  irrespective  of  what  is  the  history 
of  play.  That  is,  player  3  believes  that  player  1  is  either  using  the  behavior  rule  ft  that  is  given  by 
4>\(£t)(a)  =  3/4  (where  the  =  means,  irrespective  of  the  values  of  t  and  Ci  )/  or  she  believes  that  1 
uses  ft  given  by  ftt(Ct)(a)  =  1/4  .  Player  3  entertains  similar  beliefs  about  the  behavior  rule  used 
by  player  2;  we  use  ft  and  ft  to  denote  the  two  possibilities.  Moreover,  player  3  believes  that  her 
rivals  randomize  at  each  date  independently;  i.e.,  if  player  1  is  using  c  and  player  2  is  using  ft  , 
then  player  3  assesses  that  the  probability  of  (a,  a')  in  any  round  is  (3/4X3/4)  =  9/16  .  Imagine  that 
player  3  initially  believes  that  if  player  1  is  using  ft  ,  then  it  is  more  likely  that  player  2  is  using  ft  . 
Specifically,  player  3's  prior  belief  is  thar.  1  will  use  c1  and  2  will  use  o"  with  probability  .4;  1  will 
use  ft  and  2  will  use  ft  with  probability  .05;  1  will  use  ft  and  2  will  use  ft  with  probability 
.05;  and  1  will  use  ft  and  2  will  use  ft1  with  probability  .4.  And,  finally,  imagine  that  3  uses  the 
sequence  of  observed  play  and  Bayes'  rule  to  update  her  beliefs  about  the  joint  behavior  rule  profile 
used  by  her  rivals.  With  these  data,  we  can  integrate  out  to  find  3's  assessment  about  what  1  and 
2  will  do  at  any  date  (  ,  given  any  history  C.,  .  It  is  evident  that  although  3  believes  that  1  and  2 
are  randomizing  independently,  her  initial  uncertainty  about  what  behavior  rules  they  are  using  and 
the  correlation  in  her  initial  beliefs  about  their  behavior  rule  profile  imply  that  she  will  be  making 
assessments  about  their  play  at  each  date  that  reflect  correlation.  If  we  condition  on  player  1  playing 
a  at  date  1,  this  makes  it  more  likely  that  player  1  is  using  ft  than  ft  ,  which  makes  it  more  likely 
that  player  2  is  using  ft  ,  which  makes  a'  more  likely.  Note  as  well  that  even  though  player  3 
believes  (with  probability  one)  that  her  rivals  do  not  change  their  behavior  from  date  to  date  as  a 
function  of  what  happens  in  the  course  of  play,  her  assessments  n\(Ct)  very  much  depend  on  <i  , 
since  the  history  of  play  up  to  date  t  gives  player  3  information  about  what  behavior  rule  profile 
her  rivals  are  in  fact  using. 


assessments  can  reflect  her  strategic  uncertainty. 7 

3.  Fictitious  play 

The  model  of  fictitious  play  (Brown,  1951;  Robinson,  1951)  can  be  viewed  as 
a  model  of  learning  and  behavior.  First  we  give  the  details  of  fictitious  play,  and 
then  we  discuss  its  interpretation  as  a  model  of  learning  and  behavior. 

In  fictitious  play,  there  are  two  players;  i.e.,  1  =  2.  In  this  setting,  we 
interpret  —  i  as  "not  ?";  i.e.,  — ?  =  3  —  ?  for  2  =  1,2.  Otherwise,  the  general 
setting  is  just  as  in  Section  2.  The  behavior  and  assessment  rules  are  build  up  as 
follows. 

(A)  For  each  player  i,  strategy  sl  €  S' ,  and  history  (, ,  let  k((<)(-s')  be  the 
number  of  times  that  i  played  s1  in  the  t  —  1  observations  that  comprise  (,  .8 

(B)  For  each  player  i ,  there  is  an  "initial  weight"  function  77'  :  S~l  — +  [0, 00)  such 
that  Ej-'-gs-Vk"'')  >°- 

(C)  For  each  player  i,  date  t  >  1 ,  history  (t ,  and  strategy  s~l  e  S~' ,  we  define 
r?j(G)(s_i)  =  riKs-*)  +  fc(Ct)(s-*")  ■  (Note  that  ^(CiX*-*')  =  ^(s-'')  for  all  5-  .) 
Then  player  z  's  assessment  rule  //'  is  given  by  normalizing  the  r?' ;  i.e., 

MGKs    )  =  — 

Eves-^HCt)^') 

(D)  For  player  i,  at  each  date  t  with  history  (t ,  ^}(C<)  is  a  maximizer  of 

uV^-MCtXs-')  (3.1) 


a-'es- 


over  all  cr'  G  £" 


We  are  grateful  to  Bob  Aumann  for  convincing  us  of  how  important  this  is 

Q 

We  do  not  bother  to  write   k\  ,  since  the 
and  the  player  whose  strategy  is  being  counted 


Q 

We  do  not  bother  to  write   k\  ,  since  the  two  arguments  determine  the  length  of  the  history 


In  (D),  we  have  not  pinned  down  the  definition  of  <f>\((t)  when  there  is 
more  than  one  maximizer  of  (3.1).  We  do  require  that  <£}(C<)  make  a  particular 
prescription  in  such  cases  (which,  of  course,  can  be  a  mixed  strategy),  but  we  do 
not  say  what  it  is.  Formally,  we  would  say  that  a  model  of  learning  and  behavior 
is  consistent  with  the  model  of  fictitious  play  if  there  are  initial  weight  functions 
7;'  such  that  (C)  holds  as  a  definition  of  the  assessment  rules  fi'  and  (D)  holds 
as  a  condition  on  the  behavior  rules  <j>' . 

We  trust  that  most  readers  will  be  familiar  with  the  model  of  fictitious  play, 
but  it  may  help  the  uninitiated  to  give  a  simple  example.  Imagine  two  players 
who  repeatedly  play  the  strategic-form  game  in  Figure  3-1,  with  player  1  choosing 
a  row  and  player  2  a  column.  We  assume  that  the  game  begins  with  the  players 
holding  "beliefs" 

7?1  =(1,0,4.32)  and  r,7  =  (3,5.7), 

where  we  write  these  functions  as  vectors  with  the  understanding  that  the  first 
component  of  771  corresponds  to  column  1,  the  second  component  to  column  2, 
and  so  on. 


Column  1 

Player  2 
Column  2 

Column  3 

*-    Row  1 

w 
CD 

5,1 

8,4.7 

2,3 

re 

a.    Row  2 

2.3 

■ 

2,1 

4,2 

Fie.  3-1.  A  strategic-form  game. 


Refer  to  the  first  three  lines  of  Table  3-1,  which  are  labeled  round  number 
1.  The  second  line  gives  data  for  player  1;  first  her  relative  beliefs  about  what 
player  2  will  do  in  the  first  round  (i.e.,  the  vector  771 );  and  next  the  expected 
payoffs  she  will  accrue  given  those  beliefs  if  she  chooses  row  1  and  then  row  2. 
Row  2  gives  the  higher  payoff,  and  that  is  wrtten  down  as  her  choice.  Similarly, 
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Round  number  1 
Player  1 
Player  2 

Round  number  2 
Player  1 
Player  2 

Round  number  3 
Player  1 
Player  2 

Round  number  4 
Player  1 
Player  2 


"Beliefs"  about  rival 
1  0    4.32 

3      5.7 


Expected  payoffs 
2.56*  3.62 
2.31    2.28    2.34 


Choice 
row  2 

column  3 


"Beliefs"  about  rival 
1  0    5.32 

3      6.7 


Expected  payoffs 
2.47   3.68 
2.38    2.14    2.31 


Choice 

row  2 

column  1 


"Beliefs"  about  rival 

2  0    5.32 

3  7.7 


Expected  payoffs 
2.82*  3.45 
2.44    2.04    2.28 


Choice 

row  2 

column  1 


'Beliefs"  about  rival 
3  0    5.32 

3      8.7 


Expected  payoffs 
3.08*  3.28 
2.49    1.95    2.26 


Choice 

row  2 

column  1 


Table  3-1.  An  example  of  fictitious  play. 

given  2's  beliefs  about  what  player  1  will  do  (the  vector  tj2 ),  2's  best  choice  is 
column  3. 

Move  to  the  second  three  lines,  labeled  round  number  2.  Plaver  l's  beliefs 
about  the  actions  of  player  2  are  changed  to  reflect  what  happened  in  the  first 
round.  Since  player  2  chose  column  3  in  the  first  round,  the  entry  for  column 
3  in  l's  beliefs  is  increased  by  1.  (That  is,  -q\  =  (1,0,5.32).)  We  recompute  the 
expected  payoffs  to  player  1  of  playing  either  row,  using  these  reassessed  beliefs, 
and  we  see  that  row  2  continues  to  be  player  l's  best  choice.  But  player  2  now 
finds  that  column  1  is  optimal,  when  his  beliefs  are  changed  to  reflect  player  l's 
choice  of  row  2  in  the  first  period.  Hence  in  the  second  round  row  2  and  column 
1  are  chosen.  This  gives  beliefs  for  round  number  3,  and  so  on. 

Fictitious  play  was  not  originally  advanced  as  a  model  of  how  individuals 
would  behave  (and  learn)  when  playing  a  game  repeatedly;  it  was  advanced 
instead  as  a  method  for  computing  Nash  equilibria9  or  perhaps  as  a  model 
of  the  preplay  thought  process  of  individual  players.  How  well  does  it  stand 
as  a  model  of  learning  and  behavior?   The  following  two  questions  are  raised 


The  connection  will  become  dear  in  a  bit. 
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immediately. 

(1)  Is  there  any  particular  sense  to  how  assessments  are  being  formed?  It  can  be  shown 
that  the  assessment  rules  //'  are  consistent  with  a  Bayesian  model  in  which  each 
player  believes  her  rival  is  playing  the  same  (unknown)  mixed  strategy  in  each 
round,  independent  of  what  came  before,  and  where  each  player's  prior  assess- 
ment concerning  this  unknown  behavior  strategy  has  a  Dirichlet  distribution. 

(2)  Is  it  sensible  or  realistic  to  assume  that  players  would  behave  myopically,  in  the  sense 
that,  in  each  round,  they  choose  a  strategy  that  maximizes  their  immediate  expected  payoff, 
given  their  assessments?  Behavior  that  is  myopic  in  this  sense  will  be  discussed 
in  Section  4,  so  for  now  we  only  note  that  if  each  player  believes  that  his  rival 
does  not  respond  to  the  history  of  play  —  as  posited  in  our  answer  to  question 
(1)  just  preceding  —  then  myopic  behavior  in  this  fashion  is  warranted. 

Accepting  the  model  as,  at  least,  a  very  specific  but  interesting  parameteri- 
zation of  learning  and  behavior,  we  can  ask  about  its  long-run  implications.  One 
possibility  arises  in  the  example  of  Figure  3-1  and  Table  3-1;  if  we  follow  this  out 
until  round  8,  then  in  round  8  play  reaches  the  profile  row  1-column  2.  Since 
this  pair  is  a  strict  Nash  equilibrium  for  the  game,  increasing  the  weight  on  row 
1  in  player  2's  assessment  and  increasing  the  weight  on  column  2  in  player  l's 
assessment  only  increases  the  optimality  of  column  2  and  row  1,  respectively. 
Thus  play  "gets  stuck"  at  this  pure-strategy  Nash  equilibrium.  In  general, 

Proposition  3.0.  In  any  history  generated  by  fictitious  play,  if  a  strategy  profile  is  played 
that  is  a  strict  Nash  equilibrium,  then  all  subsequent  play  will  be  that  strategy  profile. 

Or,  speaking  very  loosely,  strict  Nash  equilibria  are  absorbing  for  play  according 
to  the  model  of  fictitious  play.  A  related  observation  is  the  following. 

Proposition  3.1.  Suppose  that  in  some  history  generated  by  fictitious  play,  a  particular 
pure-strategy  profile  is  played  for  all  but  a  finite  number  of  periods.  Then  that  strategy 
profile  must  be  a  Nash  equilibrium. 
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We  refrain  from  giving  the  proof  here;  this  is  an  easy  corollary  to  Proposition  4.1, 
which  is  proved  later. 

Thus  we  see  one  possibility;  play  might  "stick"  at  some  pure-strategy  profile. 
If  so,  this  profile  must  be  a  Nash  equilibrium.  (It  goes  almost  without  saying  that 
judicious  choice  of  the  initial  weight  functions  will  allow  fictitious  play  to  stick 
at  any  strict  equilibrium.  Depending  on  how  ties  are  broken,  this  is  true  as  well 
of  any  equilibrium,  even  those  in  weakly  dominated  strategies.) 

Proposition  3.1  implies  that  fictitious  play  cannot  converge  to  a  single  pure- 
strategy  profile  in  games  that  have  no  pure-strategy  equilibria.  Moreover,  even 
in  games  that  do  have  pure-strategy  equilibria,  fictitious  play  may  fail  to  lock 
on  to  a  single  pure-strategy  profile.  For  example,  take  the  game  in  Figure  3-1, 
and  change  the  entry  4.7  in  row  1-column  2  to  a  4,  giving  the  game  in  Figure 
3-2.  Note  that  row  1-column  2  is  still  a  strict  Nash  equilibrium.  Begin  fictitious 
play  with  the  same  initial  weight  vector  as  before,  and  it  turns  out  that  column  2 
will  never  be  played.  Instead,  play  "cycles"  around  the  best  response  cycle  row 
1-column  1  to  row  1-column  3  to  row  2-column  3  to  row  1-column  3,  where 
"cycles"  is  put  in  quotes  because  the  periods  of  the  cycles  increase  through  time. 
In  the  limit,  however,  the  relative  frequencies  of  play  of  the  various  strategies 
converges.  That  is,  player  1  plays  row  1  one  third  of  the  time  in  the  limit,  and 
player  2  plays  column  1  two  fifths  of  the  time  and  column  3  three  fifths  of  the 
time.  Hence  the  players'  beliefs  about  how  each  other  will  be  playing  converge 
to  the  corresponding  mixed  strategies.  It  is  straightforward  to  see  that  these 
mixed  strategies  constitute  a  mixed  Nash  equilibrium.  More  generally,  we  have 
Proposition  3.2. 

Proposition  32.  Suppose  that  in  some  history  generated  by  fictitious  play,  the  empirical 
frequencies  of  pure-strategy  choices  converge  to  some  (mixed)  strategy  profile.  Then  that 
strategy  profile  is  a  Nash  equilibrium. 

The  proof  is  omitted  for  now;  this  is  a  corollary  of  Proposition  4.2.  Note  that  this 

13 


proposition  implies  Proposition  3.0  as  a  special  case. 

It  is  natural  to  ask  whether,  in  every  game  and  for  every  set  of  initial  con- 
ditions, convergence  at  least  in  the  sense  of  Proposition  3.2  will  take  place  un- 
der fictitious  play.  There  are  entirely  trivial  reasons  why  convergence  may  fail, 
connected  with  the  way  in  which  ties  (among  optimal  strategy  choices)  are  bro- 
ken. However  if  due  care  is  taken  in  dealing  with  ties,  then  it  is  known  that 
convergence  in  this  sense  is  ensured  for  zero-sum  games  (Robinson,  1951)  and 
two-by-two  games  (?vliyasawa,  1961).  However  for  general  games  convergence 
is  not  ensured;  the  first  (nontrivial)  example  is  given  in  Shapley  (1964). 

4.  Extensions  of  fictitious  play 

One  problem  with  the  model  of  fictitious  play  is  its  very  rigid,  ad  hoc  spec- 
ification. Assessments  are  formed  according  to  the  empirical  frequencies  of  past 
play  (up  to  the  initially  given  weight  vectors),  and  actions  are  chosen  to  maxi- 
mize precisely  immediate  expected  payoffs.  Neither  part  of  this  specification  is 
essential  to  the  results  given  above;  we  can  obtain  similar  results  for  a  broad  class 
of  models  of  learning  and  behavior.  In  this  section,  we  present  some  results  of 
this  sort. 

Myopic  behavior 

Definitions.    Given  an  assessment  rule   u'  =  (/xj,^,...)  for  player   i,  we  say  that 

the  behavior  rule  <t>1  =  (d>\,4>l2 )  for  i  is  myopic  relative  to  p*  if,  for  every  t  and 

G  /  <?{((«)  maximizes  i 's  immediate  expected  payoff ,  given  assessment  //}(Ci)-  That  is, 
n'(4>,t(Ct),^(Ct))  =  maxs^s^'(s\fi](Q). 

The  behavior  rule  <f>1  is  asymptotically  myopic  relative  to  px  if  for  some  sequence 
of  strictly  positive  numbers  \tt)  with  limit  zero,  for  every  t  and  (, ,  <j>)(Ci)  comes 
within  et  of  maximizing  i 's  immediate  expected  payoff ,  given  assessment  fi\((t)  ■  That 
is,  u'(<p't((t)^\((())  +  et  >max3.€S,u'(s',fi\(Ct)). 
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The  behavior  rule  4>'  is  strongly  asymptotically  myopic  relative  to  p'  if  for 
some  sequence  of  strictly  positive  numbers  {et}  with  limit  zero,  for  every  t  and  Q  , 
every  s'  in  the  support  of  d>\((t)  comes  within  tt  of  maximizing  i  s  immediate  expected 
payoff,  given  assessment  p\(Q .  That  is,  u'iP.p^Ct))  +  U  >  max3;eS,  u'(s' ,  fi\((t)) 
for  all  s'  in  the  support  of  <fj\((t) . 

Note  that  in  asymptotically  myopic  behavior,  the  player  can  use  slightly  sub- 
optimal  pure  strategies  with  large  probability,  or  he  can  use  grossly  suboptimal 
pure  strategies  with  small  probability,  or  both,  as  long  as  the  "average"  subopti- 
mality,  averaged  according  to  the  probabilities  with  which  the  pure  strategies  are 
played,  is  small  enough.  In  strong  asymptotic  myopia,  grossly  suboptimal  pure 
strategies  cannot  be  used  at  all. 

We  will  work  throughout  with  models  of  learning  and  behavior  for  which 
behavior  is  at  least  asymptotically  myopic  with  respect  to  the  assessment  rules. 
Even  this  less  restrictive  assumption  has  one  feature  that  is  potentially  trouble- 
some: It  implicitly  supposes  that  players  do  not  try  (asymptotically)  to  influence 
the  future  play  of  their  opponents.  To  see  this,  consider  the  game  in  Figure  4- 

1,  and  imagine  that  player  2  selects  actions  according  to  the  model  of  fictitious 
play.  In  this  game  row  2  is  dominant  for  player  1,  and  so  if  player  l's  behavior 
is  asymptotically  myopic  for  any  assessment  rule,  she  will  play  row  1  eventually. 
Since  player  2  uses  the  assessment  and  behavior  rules  of  fictitious  play,  he  will 
eventually  choose  column  1;  play  converges  to  the  pure-strategy  equilibrium, 
row  2-column  1.  But  if  player  1  does  not  behave  asymptotically  myopically  and 
instead  chooses  row  1  each  time,  then  player  2  would  eventually  choose  column 

2.  If  player  1  discounts  her  payoffs  with  a  discount  factor  close  to  one,  this  gives 
her  a  higher  overall  payoff.  The  point  is  a  simple  one.  As  long  as  player  2  is 
playing  according  to  the  model  of  fictitious  play,  player  1  can  exploit  this  and 
manipulate  2's  beliefs  in  order  to  receive  her  "Stackelberg  leader"  outcome  (cf. 
Fudenberg  and  Levine,  1989). 

15 


Player  2 
Column  1       Column  2 

Z  Rowl 

1.0 

3,2 

>> 

to 

0-    Row  2 

2,1 

4,0 

Fig.  4-1.  A  strategic-form  game  illustrating  the  possibility  of  Stackelberg  leadership. 

In  light  of  this  example,  our  assumption  of  asymptotically  myopic  behavior 
requires  some  defense  and  explanation.  We  defend  the  assumption  with  stories 
that  combine  two  justifications  in  varying  proportions:  First,  even  if  a  player's 
possible  influence  on  opponents'  future  play  is  large,  the  player  may  discount 
the  future  sufficiently  that  the  effects  are  unimportant. 

Second,  even  if  the  players  are  relatively  patient,  they  may  believe  that  their 
current  action  will  have  little,  if  any,  effect  on  what  will  happen  in  the  future. 
Suppose,  for  example,  that  player  i  believes  that  her  rivals  chooses  actions  in 
each  period  according  to  some  fixed  but  unknown  (and  possibly  mixed)  strategy 
profile,  which  is  not  influenced  by  the  actions  of  other  players.  Moreover,  because 
i  learns  her  rivals'  actual  play  at  each  date  regardless  of  what  i  chooses  to  do, 
i  's  immediate  choice  of  action  will  not  affect  what  i  learns,  and  thus  (as  long 
as  i  's  behavior  is  subsequently  myopic)  it  will  not  affect  i  's  own  subsequent 
actions. 10  Weakening  this  slightly,  if  i  believes  that  her  rivals  will  be  playing  a 
fixed  strategy  asymptotically,  then  asymptotically  myopic  behavior  (for  the  same 
reasons)  is  warranted. 

We  are  not  very  happy  with  either  of  these  two  justifications  on  its  own.  In 
order  to  permit  learning  to  take  place,  play  must  be  repeated  "frequently,"  more 


10  The  point  of  this  remark  may  not  be  apparent.  Imagine  that  :'  's  choice  of  action  in  round  i 
affects  the  information  she  receives  about  the  strategy  choices  of  her  rivals  in  that  period.  This  would 
be  natural,  for  example,  if  we  imagined  that  the  stage  game  is  an  extensive-form  game,  and  players 
only  observe  the  outcome  of  each  round  of  play.  Then  «'  's  choice  of  action  today  might  affect  her  own 
subsequent  actions;  and  she  might  choose  to  invest  in  information  today  by  taking  an  action  that  is 
(myopically)  suboptimal  but  that  may  generate  useful  information  for  guiding  future  choices. 
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frequently  than  would  be  suggested  by  a  substantial  discount  rate,  except  for 
extraordinarily  impatient  players.  And  the  story  that  players  regard  their  rivals 
as  playing  fixed  strategies  repeatedly  suffers  from  internal  inconsistency;  why 
should  a  player  imagine  that  his  rivals  are  so  different  from  himself?  A  belief 
that  one's  rivals  will  settle  down  to  repeated  play  of  a  single  strategy  profile 
(justifying  asymptotic  myopia)  is  more  palatable,  especially  when  each  plaver,  in 
consequence,  settles  down  to  repeated  play  of  a  single  strategy.  But  even  in  this 
more  palatable  story,  each  player  is  (effectively)  assuming  that  his  rivals  settle 
down  more  quickly  than  the  player  does  himself. 

More  convincing  justifications  of  myopia  can  be  given  by  enriching  our  story 
and  combining  the  two  justifications.  Rather  than  thinking  of  a  small  group  of 
players  who  interact  repeatedly,  we  think  of  situations  in  which  there  are  a  large 
number  of  (potential)  players  who  interact  repeatedly.  Imagine  that  we  have  five 
thousand  players  1,  five  thousand  players  2,  and  so  on,  that  repeated  meetings 
between  the  same  sets  of  players  are  rare,  and  that  whenever  a  player  meets 
some  group  of  rivals,  he  is  unaware  of  how  these  rivals  acted  in  the  past.  To  be 
more  precise,  imagine  that  one  of  the  following  three  stories  holds. 

Story  1.  At  each  date  t ,  one  group  of  players  is  selected  to  play  the  game.  They 
do  so,  and  their  actions  are  revealed  to  all  the  potential  players.  Those  who  play 
at  date  t  are  then  returned  to  the  pool  of  potential  players,  and  another  group 
is  chosen  at  random  for  date  t  +  1 . 

Story  2.  At  each  date  1  there  is  a  random  matching  of  all  the  players,  so  that 
each  player  is  assigned  to  a  group  with  whom  the  game  is  played.  At  the  end 
of  the  period,  it  is  reported  to  all  how  the  entire  population  played.  (That  is,  at 
the  end  of  the  period,  it  is  announced  that  twenty  percent  of  the  Is  chose  row  1, 
and  so  on.)  The  play  of  any  particular  player  is  never  revealed. 

Story  3.  At  each  date  t  there  is  a  random  matching  of  the  players,  and  each 
group  plays  the  game.    Each  player  recalls  at  date   t   what  happened  in  the 


previous  encounters  in  which  he  was  involved,  without  knowing  anything  about 
the  identity  or  experiences  of  his  current  rivals. 

In  each  of  these  stories,  myopic  behavior  seems  "sensible,"  for  reasons  that  mix  to 
varying  degrees  the  two  basic  justifications  given  above.  In  the  first  story,  the  first 
justification  is  mostly  at  work.  Although  the  game  is  played  relatively  frequently, 
any  single  individual  plays  very  infrequently,  and  at  any  reasonable  discount  rate, 
immediate  payoff  considerations  will  dominate  any  long-run  considerations.  In 
the  second  and  third  stories,  it  is  more  a  matter  of  each  player  believing  (now, 
with  good  reason)  that  his  own  immediate  actions  will  have  little  impact  on 
how  his  future  rivals  will  behave.  In  story  2,  this  is  because  each  player  may 
believe  that  how  he  behaves  will  have  little  influence  on  the  reported  aggregate 
distribution;  in  story  3,  this  is  because  each  player  attaches  low  probability  to 
the  possibility  that  his  current  rivals  will  be  future  rivals  any  time  soon,  or  even 
that  future  rivals  will  indirectly  be  affected  by  the  player's  own  immediate  play 
through  an  effect  on  the  player's  immediate  rivals,  who  then  (through  some  chain 
of  individuals)  affects  future  rivals.  These  stories  make  myopic  behavior  more 
plausible  intuitively,  although  it  remains  a  question  worth  exploring  whether  this 
plausible  intuition  has  some  firm,  formal  basis. 

Adaptive  assessments 

Once  we  assume  that  behavior  is  (asymptotically)  myopic,  the  next  step  is 
to  specify  the  assessment  rules  /j'  used  by  the  players  and,  in  particular,  how 
these  assessments  are  revised  as  the  players  observe  the  actions  of  others.  In 
the  models  we  will  consider,  players  believe  that,  at  least  asymptotically,  the 
past  choices  of  opponents  are  to  some  extent  representative  of  future  choices.  A 
fairly  weak  property  that  captures  this  idea  is  suggested  by  Milgrom  and  Roberts 
(1991)." 


Milgrom  and  Roberts  define  adaptivebehavior  as  opposed  to  assessments,  but  it  will  be  apparent 
that  our  formal  definition  is  just  a  gloss  on  theirs. 

IS 


Definition.  The  assessment  rule  ^  is  adaptive  if  for  every  e  >  0  and  for  every  1 , 
there  is  some  T(e,t)  such  that  for  all  t'  >  T(e,t)  and  histories  £<> ,  /x{,(G')  puts 
probability  no  more  than  e  on  the  set  of  pure  strategies  by  i  's  opponents  that  were  not 
played  at  all  between  times  t  and  t'  (according  to  (<< ). 

In  words,  the  definition  says  that  i  puts  very  little  weight  on  strategies  by  her 
rivals  that  have  not  been  played  for  a  long  (enough)  time.  The  class  of  adaptive 
assessment  rules  is  very  broad,  including,  for  example,  assessments  that  take  a 
weighted  average  of  the  history  of  past  plays  by  one's  opponent,  as  long  as  the 
weight  put  on  any  initial  segment  of  history  can  be  made  small  by  lengthening 
the  segment.  Four  examples  of  adaptive  assessment  rules  are:  (a)  assess  that 
one's  rivals  will  play  in  period  t  whatever  was  played  in  t  —  1  (the  assessments 
that  go  with  Cournotian  dynamics);  (b)  assess  that  one's  rivals  will  play  according 
to  an  exponentially  weighted  average  of  past  plays;  (c)  assess  that  one's  rivals  are 
equally  likely  to  play  any  action  that  has  been  played  at  least  one  percent  of  the 
time,  with  zero  probability  for  all  other  actions;  and  (d)  assess  according  to  the 
scheme  of  fictitious  play  (where  all  previous  observations  are  equally  weighted). 
While  the  class  of  adaptive  assessment  rules  is  broad,  there  are  arguments 
that  restricting  attention  to  this  class  is  too  restrictive.  See,  for  example,  the 
discussion  in  Milgrom  and  Roberts  (1991,  page  89ff)  concerning  sophisticated 
learning. 

Convergence  to  a  pure-strategy  profile 

We  are  now  in  a  position  to  generalize  Proposition  3.1.  Fix  assessment  rules 
u1  and  behavior  rules  4>l  for  our  two  players.  To  state  the  result,  we  require  a 
definition. 

Definition.  The  infinite  history  (  =  (s, ,  s2,  ■  ■  •)  is  said  to  be  compatible  with  behavior 
rules  0  if  for  each  t  =  1 , 2, . . .  and  for  i  =  1, . . . ,  I ,  the  action  s\  is  in  the  support  of 

m,). 
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That  is,  (  is  something  that  could  be  observed  with  positive  probability  over  all 
finite  time  horizons  for  players  who  behave  according  to  the  behavior  rules  that 
are  given. 

Proposition  4.1. 12  Let  (  be  an  infinite  history  (s-,,s2,...)  such  that  for  some  sm  G  S 
and  for  some  T ,  st  -  s.  for  all  t  >  T .  If  (  is  compatible  with  adaptive  assessment 
rules  //'  and  behavior  rules  <f>z  that  are  strongly  asymptotically  myopic  relative  to  the 
assessment  rules,  then  s»  must  be  a  Nash  equilibrium  of  the  stage  game. 

Proof.  Normalize  the  payoffs  in  the  game  so  that  the  range  of  payoffs  for  each 
player  is  no  greater  than  one.  Suppose  Q  is  the  partial  history  (si,s2,..-  ,st)  of 
(  .  Maintain  the  hypotheses  of  the  proposition;  (  is  compatible  with  the  4>* ,  and 
C's  components  are  eventually  s, .  Suppose  that  s„  is  not  a  Nash  equilibrium. 
Then  (without  loss  of  generality)  player  1  has  a  better  response  to  s~}  than  s\  . 
Let  s1  be  1  's  better  response,  and  set 

4 

Because  the  \ix  are  adaptive  and  because  history  eventually  settles  on  repeated 
play  of  s* ,  we  can  find  a  T  sufficiently  large  so  that  for  all  t  >  T ,  the  probabil- 
ity assessment  of  player  1 ,  p.](Ct)  >  puts  probability  at  least  1  —  e  on  the  play  of 
s\  .  Thus  the  expected  payoff  to  1  for  all  t  >  T  from  playing  s\  (against  1  's 
assessment  of  her  rivals'  strategy  choices)  is  at  least  4e(l  —  e)  -  e  -  3e  -  4c2  >  e 
worse  than  1  's  payoff  from  playing  S1  . 13  Thus  s\  is  more  than  e  suboptimal 
against  sZl  for  all  t  >  T ,  which  implies  that  1  's  behavior  rule  is  not  strongly 
asymptotically  myopic,  a  contradiction.  ■ 


12  Compare  with  Milgrom  and  Roberts  (1991,  Theorem  3[ii]). 

1  This  is  computed  as  the  probability  assessed  that  -1  plays  sf1  ,  at  least  1  —  c ,  times 
4f  =  ti'Cs'.sT1)  -  uHs\,sZ]) ,  less  the  probability  that  -1  plays  anything  other  than  s^1  ,  which  is 
no  more  than  (  ,  times  the  maximum  possible  difference  in  payoffs  playing  S1  and  s\  ,  which  by 
the  normalization  is  one. 
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Note  that  the  proposition  assumes  that  behavior  rules  are  strongly  asymp- 
totically myopic.  If  we  deleted  the  modifier  strongly,  the  result  would  be  false 
as  stated.  Consider,  for  example,  repeated  play  of  the  prisoners'  dilemma,  and 
behavior  where,  at  period  t,  each  player  chooses  to  cooperate  with  probabil- 
ity 1/t  and  to  defect  with  probability  (t  —  \)/t.  Consider  the  infinite  history 
where  each  player  cooperates  in  each  period.  This  history  is  compatible  with  the 
behavior  rules.  And  (for  any  assessment  rules)  the  behavior  rules  are  asymptoti- 
cally myopic,  because  they  involve  playing  a  suboptimal  strategy  with  vanishing 
probability.  But  the  strategy  profile  that  is  "repeated"  in  each  period  is  not  a 
Nash  equilibrium.  The  problem  of  course  is  that  compatibility  only  requires  that 
each  finite  history  has  positive  probability  for  the  behavior  rules.  For  the  given 
behavior  rules,  a  history  where  players  cooperate  in  each  period  has  prior  prob- 
ability zero.  "  In  order  to  obtain  a  result  in  the  spirit  of  Proposition  4.1,  but  with 
asymptotic  myopia  instead  of  strong  asymptotic  myopia,  we  must  either  be  more 
careful  about  how  we  make  histories  consistent  with  the  given  behavior  rules  or 
study  not  the  actual  history  of  play  but  the  intended  strategies  of  the  players. 
We  provide  one  result  along  these  lines  at  the  end  of  Section  6. 

Convergence  to  mixed  strategies  in  empirical  frequencies  for  1  =  2 

Next  we  proceed  to  generalizations  of  fictitious  play  and  convergence  in 
the  second  sense  of  Section  3,  where  we  look  for  convergence  of  the  empirical 
frequencies  of  observations  to  some  (possibly  mixed)  strategy  profile. 

For  the  remainder  of  this  section,  assume  that  the  game  has  two  players 
only;  i.e.,  1  =  2.  We  will  take  up  the  case  of  more  than  two  players  in  Section  5. 

Let  d((t)  :  S  — ►  '[0,  oo)  give  the  vector  of  proportions  of  strategy  profiles 
in    S   along  the  partial  history    (, ;  i.e.,   o(£t)(s)   gives  the  number  of  times   .s 


For  the  given  behavior  rules,  no  single  complete  history  has  positive  probability,  m  fact,  with 
probability  one,  there  is  never  a  complete  cessation  of  cooperation  by  one  side  or  the  other,  although 
eventually  cooperate-eooperate  is  no  longer  observed.   All  of  which  is  quite  beside  the  immediate 

point. 
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was  played  in  periods  1  through  t  -  1 ,  divided  by  t  —  1 ,  We  write  ct'(G)  for 
the  marginal  frequency  distribution  on  S'  induced  by  o((t) ;  i.e.,  ©-'(CtX^')  = 
Hs-'es-1'  tf((t)(s\.s_').  Then,  in  the  spirit  of  Proposition  3.2,  we  are  looking  for 
conditions  on  assessment  and  behavior  rules  that  guarantee, 

Suppose  C  is  an  infinite  history  (si,s2,...)  such  that  for  some  a,  e  £" , 

for  2  =  1,2.  Tfaen  a.   is  a  Nash  equilibrium  of  the  stage  game. 15 

To  get  this  result,  it  is  insufficient  that  behavior  is  strongly  asymptotically  my- 
opic with  respect  to  adaptive  assessment  rules.  Consider,  for  example,  the  game 
matching  pennies,  and  suppose  that  at  dates  t  >  4 ,  the  two  players  assess  equal 
probabilities  for  any  strategy  by  their  rival  that  has  occurred  at  least  ten  percent 
of  the  time  in  the  past;  until  date  4,they  assess  equal  probabilities  for  the  two 
strategies.  As  for  behavior,  players  behave  myopically  optimally  in  all  instances, 
with  the  following  specification  if  the  assessment  leaves  the  player  indifferent: 
If  t  is  divisible  by  3,  then  play  "heads";  otherwise  play  "tails."  What  happens 
is  that  the  sequence  of  plays  is  tails,  tails,  heads,  tails,  tails,  heads,  . . . ,  for  both 
players,  and  each  always  assesses  equal  probabilities  for  his  rival's  two  strategies. 
Empirical  frequencies  converge  to  (1/3,2/3)  for  each,  which  (of  course)  is  not  a 
Nash  equilibrium  of  the  stage  game. 16 

The  difficulty,  it  should  be  clear,  comes  from  the  fairly  weak  requirements 
of  being  an  adaptive  assessment  rule.  When  only  one  strategy  choice  by  rivals 
is  eventually  observed,  adaptive  assessment  rules  converge  together  with  fhe 


1  Please  note  carefully,  this  isn't  quite  the  same  as  asking  that  lim(  a((t)  =  a.  .  We  are  only 
asking  that  the  marginal  frequencies  converge,  and  not  the  joint  frequency  distribution.  There  is  a 
lot  behind  this  observation,  to  which  we  return  in  the  next  section. 

16  At  the  cost  of  complicating  the  description  of  the  assessment  and  behavior  rules,  we  can  modify 
this  so  that  the  two  players  eventually  play  the  mixed  strategies  (1/3,2/3)  at  all  dates;  nonconver- 
gence  of  their  intended  strategies  is  not  the  issue. 
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(degenerate)  empirical  frequencies  of  observations.  But  when  rivals  use  more 
than  one  (pure)  strategy  with  nonvanishing  frequency  (or  even  with  vanishing 
frequency  that  vanishes  sufficiently  slowly),  adaptive  decision  rules  can  assign 
probability  to  that  strategy  that  is  unrelated  to  its  limiting  empirical  frequency. 
To  obtain  the  result  that  we  seek,  we  must  sharpen  considerably  the  criterion 
imposed  on  assessment  rules.  The  simplest  and  most  direct  criterion  that  works 
runs  as  follows. 

Definition.  The  assessment  rule  //'  is  asymptotically  empirical  if  for  every  (  e  Z  , 

Urn    \U(Ct)-o-'(Q\\=0. 

where  the  Q  are  subhistories  of  the  fixed  (  . 17 

It  is  easy  to  see  that:  any  asymptotically  empirical  assessment  rule  is  adaptive; 
there  are  adaptive  assessment  rules  that  are  not  asymptotically  empirical;  the 
assessment  rule  in  the  model  of  fictitious  play  is  asymptotically  empirical. 

Is  it  reasonable  to  insist  that  assessment  rules  are  asymptotically  empirical? 
This  property  is  natural  if  one's  picture  of  a  rival's  dynamic  behavior  is  that  the 
rival  is  playing  some  (unknown)  strategy  repeatedly,  or  even  if  one  supposes 
that  one's  rival  will  converge  to  repeated,  independent  play  of  some  (unknown) 
strategy.  But  if  you  think  that  your  rival's  strategy  may  shift  repeatedly  through 
time  —  in  response  to  some  Markov  state  variable  such  as  a  sunspot,  say  — 
then  some  assessment  scheme  that  puts  relatively  more  weight  on  more  recent 
observations  or  that  tries  to  uncover  the  probabilistic  structure  of  the  regime  shifts 
would  be  more  reasonable. 

Proposition  42.  Let  (  bean  infinite  history  (s,  ,.s2,...)  such  that  for  some  a.  e  E, 

lim  a'(C«)  =  ai, 

1  —  oc 


Whenever  we  are  dealing  with  finite  dimensional  vectors  as  here,  ||  ■  ||   denotes  the  sup  norm. 
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for  i  =  l,2.  //  C  is  compatible  with  asymptotically  empirical  assessment  rules  //  and 
behavior  rules  <j>1  that  are  strongly  asymptotically  myopic  relative  to  the  assessment 
rules,  then  am  is  a  Nash  equilibrium  of  the  stage  game. 

The  proof  resembles  the  proof  of  Proposition  4.1  with  the  following  amend- 
ments. First,  since  the  assessment  rules  /j'  are  asymptotically  empirical,  the 
assessments  of  player  i  (given  by  //' )  at  the  partial  histories  (,  converge  to  the 
mixed  strategy  a~' .  If  o\  is  not  a  best  response  to  o~x ,  then  there  is  some  pure 
strategy  I'  for  player  i  that  is  strictly  better  against  a~'  than  is  some  s'  in  the 
support  of  a\ .  By  a  standard  argument,  for  some  e  >  0  and  sufficiently  large  T , 
s'  will  be  worse  against  /i'(C<)  than  is  s'  by  more  than  e,  for  all  t>T.  Thus  ±' 
will  not  be  played  eventually  (by  asymptotic  myopia).  But  this  would  contradict 
s7  being  in  the  support  of  the  limiting  frequencies  of  i  's  strategy  choices. 
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5.  Objections  to  convergence  of  the  empirical  distributions 
as  a  convergence  criterion 

Notwithstanding  the  results  of  the  previous  section,  the  convergence  criterion 
employed  fails  to  capture  what  we  want  for  a  model  of  "learning  to  play  mixed 
strategies."  Our  objections  begin  with  the  obvious  observation  that  in  examples 
such  as  fictitious  play,  players  are  (almost)  never  playing  mixed  strategies.  They 
are  instead  jumping  from  one  pure  strategy  to  another,  (typically)  in  cycles  of 
ever-increasing  length,  so  behavior  is  not  converging. 

The  rebuttal  to  this  is  that  while  behavior  is  not  converging,  beliefs  are. 
Mixed  equilibria  are  sometimes  interpreted  as  equilibria  in  beliefs;  each  side 


The  conclusion  of  Proposition  4.2  does  not  require  the  full  power  of  asymptotically  empirical 
assessment  rules;  e.g.,  the  conclusion  still  holds  for  assessment  rules  that  do  not  approach  the  em- 
pirical frequencies  along  histories  where  the  empirical  frequencies  don't  converge.  More  concretely, 
suppose  that  fi'  reports  the  empirical  frequencies  of  —  £'s  choices  over  the  most  recent  a  percent 
of  history,  for  a  strictly  between  zero  and  one-hundred.  This  assessment  rule  is  not  asymptotically 
empirical  per  our  formal  definition,  but  it  is  empirical  enough  so  that  Proposition  4.2  holds. 
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believes  the  other  to  be  acting  in  a  manner  that  makes  the  first  (nearly)  indifferent 
among  several  actions.  Under  this  interpretation,  the  convergence  criterion  used 
in  the  previous  section  is  fairly  natural  //  players  ignore  the  cycles  in  their  own 
and  their  opponent's  play. 

However  these  cycles  can  lead  to  phenomena  so  striking  that  we  do  not 
believe  they  would  be  ignored.  Consider,  for  example,  a  symmetric  battle  of  the 
sexes  as  depicted  in  Figure  5-1.  Imagine  play  of  this  game  using  the  precise 
method  of  fictitious  play,  where  each  player  begins  with  the  relative  beliefs  vec- 
tor (1,V5)  •  The  symmetry  of  the  situation  implies  that  if  player  1  chooses  top 
in  the  first  round,  2  will  choose  left,  and  vice  versa.  In  fact,  with  the  numbers 
we  are  given,  top-left  will  be  played,  and  each  player's  relative  beliefs  going 
into  the  second  round  will  be  given  by  (2,  V5) .  The  symmetry  again  implies 
play  of  either  top-left  or  bottom-right.  And  so  on,  inductively. ,9  From  general 
results  about  fictitious  play,  we  know  that  empirical  frequencies  will  converge 
to  the  Nash  equilibrium  probabilities  (2/3,1/3).  But  this  will  be  realized  with 
perfect  correlation  in  the  two  players'  choices:  Top-left  will  be  played  two-thirds 
of  the  time,  and  bottom-right  one-third.  Players  will  get  zero  round  after  round, 
there  will  be  perfect  correlation  in  their  actions,  and  yet  according  to  the  the- 
ory they  will  persist  in  believing  that  they  are  "converging"  to  the  mixed  Nash 
equilibrium. 


"Z.    Row  1 
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TO 
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Fie  5-1.  The  battle  of  the  sexes. 
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Because  the  payoffs  are  rational  and  the  relative  weights  have  an  irrational  ratio,  there  will 
never  be  a  tie;  each  player  will  have  a  unique  best  response  at  all  times. 


so 


Moreover,  this  example  shows  that  Proposition  4.2  will  run  into  difficulties 
for  the  case  I  >  2 .  Imagine  a  three-player  game,  in  which  the  actions  of  player 
3,  from  the  perspective  of  players  1  and  2,  are  irrelevant.  Players  1  and  2  simply 
play  the  battle  of  the  sexes  against  each  other  in  each  round.  Player  3,  to  choose 
an  optimal  strategy,  murt  forecast  the  joint  actions  of  her  rivals;  for  the  sake  of 
definiteness,  suppose  her  optimal  action  is  tic  if  she  believes  that  they  will  play 
to  a  main-diagonal  cell  with  probability  2/3  or  more,  and  her  optimal  action  is 
tac  otherwise. 

What  should  3  conclude,  asymptotically,  if  1  and  2  act  in  accordance  with 
the  particular  model  of  fictitious  play  given  above?  Should  she  conclude  that 
their  actions  are  perfectly  correlated,  always  playing  along  the  main  diagonal, 
hence  tic  is  optimal?  Or  should  she  conclude  that  1  will  play  top  two-thirds  of 
the  time,  2  will  play  left  two-thirds  of  the  time,  and  hence  top-left  has  asymptotic 
probability  four-ninths,  bottom-right  has  probability  one-ninth,  and  thus  tac  is 
optimal? 

There  are  (at  least)  two  different  ways  we  could  proceed,  depending  on  how 
we  extend  the  definition  of  asymptotically  empirical  assessments.  One  possi- 
ble definition  is  precisely  the  definition  given  before,  interpreting  ct_i(G)  as  the 
marginal  frequency  distribution  along  (r  of  profiles  from  S-' .  Under  this  defi- 
nition, i  's  assessment  (asymptotically)  reflects  any  correlations  in  the  play  of  her 
rivals  that  are  observed  empirically.  The  example  shows  how  this  definition  per- 
mits convergence  (under  fictitious  play)  to  non-Nash  (correlated)  assessments, 
so  that  Proposition  4.2  fails. 2C  An  alternative  definition  suppose  that  players 
asymptotically  assess  independent  play  by  their  rivals,  regardless  of  the  empir- 
ical frequencies. 21     Then  we  obtain  Proposition  4.2  for   I  >  2.    However  this 


0  One  can  repair  the  proposition  in  this  case  as  follows:  Stipulate  that  along  the  history  C  the 
empirical  joint  frequencies  are  the  products  (in  the  limit)  of  the  empirical  margins.  But  this  repair 
seems  a  bit  cheesy. 

1  One  way  to  formalize  this  is  to  define,  for  each   t  ,  ('  ,  and   s  €  S  ,  r)(0)(s)  =  Y[,ci  <7'(CiH  ')  ■ 

2G 


seems  to  us  to  be  somewhat  unnatural;  if  there  is  correlation  asymptotically,  we 
feel  that  it  is  unnatural  to  assume  that  players  ignore  it. 

Moreover,  if  it  is  unnatural  for  player  3  to  ignore  correlation  in  the  choices  of 
players  1  and  2,  then  isn't  it  equally  unnatural  for  player  1  to  ignore  correlation 
in  her  choice  of  strategy  and  that  of  player  2?  If  so,  then  the  example  indicates 
that  even  for  two-player  games,  asymptotic  empiricism  as  formulated  may  be 
dubious. 

All  these  objections  (past  our  first  and  most  basic  objection)  are  grounded 
in  the  battle-of-the-sexes  example;  if  that  example  is  nongeneric,  perhaps  these 
objections  have  less  force.  In  fact,  it  can  be  shown  that  the  example  is  nongeneric 
for  2x2  games:  In  a  2x2  game,  for  generically  chosen  payoffs  the  actions  of  the 
two  players  (under  the  model  of  fictitious  play)  will  be  asymptotically  uncorre- 
cted. However  we  conjecture  that  robust  examples  of  asymptotic  correlation  can 
be  found  in  larger  games.  The  basis  for  this  conjecture  is  the  game  rock-scissors- 
paper.  Fictitious  play  in  this  game  must  converge  to  the  unique  Nash  equilibrium 
(1/3,1/3,1/3),  since  the  game  is  zero  sum.  And,  for  most  initial  weight  vectors, 
this  happens  while  (asymptotically)  avoiding  the  three  cells  along  the  main  diag- 
onal. (Each  of  the  other  cells  has  asymptotic  frequency  1/6.)  We  conjecture  that 
these  properties  hold  for  a  neighborhood  of  games  around  rock-paper-scissors, 
although  we  are  unable  to  prove  either  convergence  to  the  Nash  equilibrium  fre- 
quencies (since  most  games  in  a  neighborhood  will  be  nonzero  sum)  nor  are  we 
sure  of  the  asymptotic  frequencies  of  the  cells.  Robust  examples  can  be  created 
easily,  though,  if  we  move  from  the  strictures  of  exact  fictitious  play. 

Because  we  cannot  verify  our  conjecture,  we  do  not  leave  neat-and-tidy  our 
secondary  objections  based  on  asymptotic  correlation  in  empirical  frequencies. 
Nonetheless  in  our  view  the  first  objection  —  that  this  mode  of  convergence 


That  is,  fj(Ci)  gives  the  "frequency  distribution"  obtained  by  using  the  marginal  frequencies  a'  and 
forcing  independence.  Let  fj~'(Ci)  give  the  S-'  marginal  distribution  of  f?(G).  Then  asymptotic 
empiricism  in  this  second  sense  is  the  condition  limi  ||/i'(Cr)  -  *7-,(Ct)|l  =  0  along  every  history  C  • 


does  not  correspond  to  learning  to  play  mixed  strategies  —  suffices  to  motivate 
research  into  stronger  modes  of  convergence.  With  this  motivation,  then,  we 
proceed. 

6.  Convergence  of  behavior  strategies 

Rather  than  look  for  convergence  of  empirical  frequencies  (and  hence  as- 
sessments about  the  actions  of  others),  we  look  for  convergence  of  the  behavior 
strategies  employed  by  plavers.  That  is,  we  study  convergence  (in  i )  of  <p]((,t) 
to  some  o\  6  -' ,  for  each  player  ? . 

Because  we  wish  to  consider  games  with  I  >  2 ,  we  must  first  specify  how 
we  will  adapt  the  definition  of  asymptotically  empirical  assessments.  We  proceed 
in  the  easiest  fashion,  by  using  the  definition  precisely  as  it  was  given  earlier,  but 
interpreting  —  z  as  the  set  of  i's  rivals.  That  is,  i's  assessments  asymptotically 
exhibit  any  correlation  that  is  observed  empirically  in  the  choices  of  her  rivals. 

Two  problems  surface  immediately.  First,  if  <f>\(Ct)  is  meant  to  converge 
to  a  mixed  strategy,  then  player  z  must  be  willing  to  play  one  of  several  pure 
strategies.  In  the  model  of  fictitious  play,  we  insisted  that  players  choose  only 
myopic  best  replies,  computed  on  the  basis  of  their  assessments  of  the  actions 
of  their  rivals.  How  likely  is  it  that  a  player,  based  on  some  history  of  play, 
would  assess  for  his  rival  precisely  the  mixed  strategy  that  makes  him  (the  first 
player)  indifferent?  If  this  is  unlikely,  how  can  we  ever  have  players  willfully 
randomizing? 

It  is  here  that  the  asymptotic  parts  of  asymptotic  myopia  and  asymptotic 
empiricism  come  into  play.  We  do  not  insist  in  general  that  players  play  only 
myopic  best  responses;  they  can  play  slightly  suboptimal  responses,  as  long  as  the 
degree  of  suboptimality  vanishes  as  time  ( t )  passes.  So  if  assessments  converge 
to  the  equilibrium  mixed  strategies  quickly  enough  relative  to  the  rate  at  which 
the  allowable  suboptimality  vanishes,  we  can  sustain  mixed  strategies  even  if 
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assessments  do  not  match  precisely  the  equilibrium  mixtures.  At  the  same  time, 
we  do  not  insist  that  players'  assessments  are  precisely  empirical;  if  the  empirical 
frequencies  of  play  converge  to  some  equilibrium  mixed  strategy,  then  players' 
beliefs  can  sit  at  precisely  that  limit  mixed  strategy,  justifying  the  play  of  mixed 
strategies  even  if  behavior  is  precisely  myopic. 

This  means  that  the  divergence  from  precisely  myopic  behavior  and  precisely 
empirical  beliefs  which  we  allow  carry  a  lo!  of  power  in  our  story,  at  least  insofar 
as  convergence  to  mixed  strategies  is  concerned.  As  we  shall  see,  we  don't  require 
both  at  once.  That  is,  our  results  obtain  with  asymptotic  myopia  and  beliefs  that 
are  precisely  empirical,  or  with  behavior  that  is  precisely  myopic  and  beliefs  that 
are  asymptotically  empirical.  But  we  will  need  one  or  the  other,  if  we  are  to  hope 
for  convergence  of  behavior  to  mixed -strategy  profiles. 

The  formal  "nonconvergence"  criterion:  Unstable  strategy  profiles 

The  second  problem  that  is  raised  is  that  statements  of  convergence  in  terms 
of  behavior  strategies  must  be  probabilistic  statements.  To  see  what  is  at  issue 
here,  imagine  playing  the  matching  pennies  game  repeatedly.  Suppose  that  along 
some  history  £ ,  the  empirical  frequencies  of  the  two  rows  and  the  two  columns 
approach  (1/2,1/2),  but  the  behavior  rules  converge  to  the  mixed  strategies 
(1/3,2/3).  This,  you  may  object,  is  very  unlikely.  How  could  players'  behavior 
strategies  be  converging  to  (1/3,2/3)  and  at  the  same  time  empirical  frequencies 
are  approaching  (1/2,1/2)?  Unlikely  is  just  the  right  word.  There  is  nothing 
that  prevents  this  —  any  history  (  is  compatible  with  behavior  rules  that  have 
players  mixing  strictly  in  each  round  —  but  by  the  strong  law  of  large  numbers, 
this  history  belongs  to  an  event  of  probability  zero.  If  behavior  strategies  are  con- 
verging to  (1/3.2/3),  then  the  strong  law  of  large  numbers  says  that  empirical 
frequencies  will  converge  to  (1/3,2/3)  with  probability  one.  Given  asymptoti- 
cally empirical  assessment  rules,  this  would  rule  out  players  continuing  to  play 
anything  close  to  the  (1/3,2/3)  strategies. 
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Accordingly,  when  giving  results  in  the  spirit  of  Propositions  4.1  and  4.2, 
we  give  results  of  the  following  form:  If  a.  is  not  a  Nash  equilibrium,  then 
there  is  probability  zero  that  behavior  will  remain  forever  in  a  small-enough 
neighborhood  of  <7» ,  no  matter  what  are  the  initial  conditions.  The  formalities 
run  as  follows. 

Fix  a  set  of  behavior  rules  <f>'  (which  will  be  accompanied  by  assessment 
rules  fil ,  although  for  the  time  being  only  the  behavior  rules  are  needed).  Recall 
that  P(-|C<)  represents  the  objective  conditional  probability  distribution  on  th.? 
space  Z  created  by  starting  at  (,  and  using  the  behavior  rules  thereafter. 

Definition.  A  strategy  profile  a.  G  U  is  unstable  if  there  exists  some  e  >  0  such  that 
for  all  t  and  (t ,  V(\\<j>ACv)  -  <r.\\  <  e  for  all  t'  >t  \  Ct)  =0. 
Note  that  e  here  is  independent  of  the  starting  conditions  (<  . 

Proposition  6.1.  Fix  behavior  rules  4>l  that  are  asymptotically  myopic  relative  to  some 
asymptotically  empirical  assessment  rules  p.' .  Then  every  strategy  profile  a,  that  is 
not  a  Nash  equilibrium  is  unstable. 

Note  that  in  this  proposition,  behavior  rules  are  required  to  be  (only)  asymp- 
totically myopic  relative  to  the  assessment  rules.  Compare  with  Propositions  4.1 
and  4.2,  in  which  strong  asymptotic  myopia  was  assumed.  We  return  to  this 
point  at  the  end  of  this  section. 

Although  the  details  of  the  proof  of  this  proposition  are  tedious,  the  idea 
is  fairly  simple.  If  behavior  lies  forever  in  a  small  neighborhood  of  the  strategy 
profile  a. ,  then  empirical  frequencies  will  eventually  lie  in  a  small  neighborhood 
too,  and  thus  the  players'  assessments  will  too.  If  the  strategy  profile  isn't  a  Nash 
equilibrum,  then  (eventually)  some  player  will  want  to  move  far  away  from  the 
strategy  profile,  a  contradiction  to  the  supposition. 

The  key  technical  result  is  the  application  of  the  strong  law  of  large  numbers, 
which  we  state  in  the  form  of  a  lemma. 
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Lemma  6.2.  Let  {xt;  t  =  1,2,...}  be  a  sequence  of  random  variables  cr.  a  probability 
space  with  range  some  finite  set  A  .  Fix  a  probability  distribution  -  on  A  and  an 
e  >  0,  and  let  A  be  the  (measurable)  subset  of  the  probability  space  consisting  of  all 
sample  points  such  that  for  t  =  1,2,...,  the  distribution  of  each  xt  conditional  on 
{xi,...  ,zt-i} ,  denoted  7rt(-|xi,...  ,  z<-i) ,  satisfies 

max   iTtialxi xt-\)  —  7r(a)j  <  e. 

Let  Tt{a)  be  the  random  variable  5I|»=1  la(zt<);  that  is,  rt{a)  is  the  number  of  times 
that  Xf  =  a  for  /'  =  !,...,/.  Then 

hm  sup.    _ <  77(a)  +  e      and  hm  inf,    ^ >  77(a)  -  t 


for  all  a  £  A,  almost  surely  conditional  on  A 
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Proof  of  Lemma  6.2.  Fix  any  a  €  A .  Construct  a  "standard"  probability  space 
{Q,F,P}  where  i?  =  [0,1]^'2,  -)  and,  writing  ut  as  the  t  th  component  of  u> ,  the 
sequence  {ut}  is  an  independent  sequence  of  random  variables,  each  uniformly 
distributed  on  the  unit  interval.  Enumerate  A  as  {a],a2,...,ajv}  (where  A7  is 
the  cardinality  of  A ),  with  a:  =  a .  Now  define  random  variables  yt  on  this 
standard  probability  space  as  follows:  For  t  =  1 ,  yi(uj)  =  a„  for  that  index  n 
such  that 

n  — 1  n 

m=l  m=l 

Then,  inductively  in  t ,  let  y,(u.')  =  an  for  that  index  n  such  that 

n  — 1  n 

^7r<(am|yiM,---,y«-i(u;))  <  ut  <  *)T  7rt(qm  \y^{u), . . . ,  y,-^)). 

m=l  m=l 

This  construction  uses  the  uniformly  distributed  random  variables  to  construct 
a  sequence  of  random  variables  whose  joint  distribution  is  identical  to  the  joint 


If  A  has  zero  pnor  probability,  the  lemma  is  taken  to  be  vacuous. 
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distribution  of  the  original  sequence  {xt}  ■  Accordingly,  we  can  define  A  in 
terms  of  the  constructed  probability  space,  and  if  we  prove  that  the  stated  bounds 
on  the  limits  superior  and  inferior  of  Tt(a)/t  hold  almost  surely  conditional  on 
A  for  each  a  taken  one  at  a  time,  then  they  hold  almost  surely  on  A  for  all  the 
(finitely  many)  as  simultaneously,  which  then  gives  the  desired  result. 

But  showing  that  the  two  bounds  hold  on  A  for  the  yt  sequence  and  the 
fixed  a  is  easy.  Because  we  set  a,  =  a,  the  set  of  points  u  G  A  for  which 
y( (a,')  =  a  contains  the  set  {u  :  w,  <  n(a)  —  e)  and  is  contained  within  the  set 
{u  :  ut  <  z(a)  +  e)  .  This  is  so  because  7r(a)  —  e  <  7rt(a|y,,  —  ,y(_i)  <  7r(a)  +  e  for 
all  t ,  for  points  in  A  by  definition;  then  compare  with  how  we  determine  those 
u  for  which  yt{u)  =  a.  For  r  6  [0,1],  let  ut(r,u)  be  the  number  of  times  that 
ut>  <  r  for  t'  =  1 ,...  ,t .  Then  the  estimate 

vt{-(a)  —  e.u)  <  rt(a,u)  <  vt{r;{a)  +  t.u)  for  u  £  A 

follows  from  the  asserted  set  inclusions.  By  the  strong  law  of  large  numbers, 

v,(r,u) 
hm  ■ —  =  r 

t  —  oc  t 

with  probability  one  for  each  r  individually,  so  this  holds  with  probability  one 
both  for  r  =  7r(a)  -  e  and  r  =  z(a)  +  e .  This,  combined  with  the  previous  bounds 
on  rt(a)  on  A,  gives  precisely  the  desired  result.  ■ 

Proof  of  Proposition  6.1.  Suppose  that  a.  is  a  strategy  profile  that  is  not  a  Nash 
equilibrium.  Then  there  is  some  player  i  and  a  pure  strategy  P  such  that  s1 
is  strictly  better  against  a~'  than  is  a\ .  Since  v'(a\a~')  is  continuous  in  both 
arguments,  we  can  find  an  e  >  0  so  that  for  all  a-'  that  are  within  It  of 
<7.-'  and  a1  that  are  within  t  of  a\ ,  both  in  the  sup  norm  (on  E~l  and  Z"  , 
respectively), 

v'(a\a-')  +  e  <  u'dVi. 
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(Interpret  o~x  here  as  any  element  of  ~E~l ;  i.e.,  a~l  needn't  be  based  on  inde- 
pendent play  by  i  's  rivals.  But  a~l  is  composed  of  independent  choices  by  ?  's 
rivals  according  to  the  components  of  the  strategy  profile  a, .) 

We  claim  that  for  this  e  and  for  all  asymptotically  empirical  behavior  rules 
<f>  and  assessment  rules  /j.  that  are  asymptotically  myopic  relative  to  these  as- 
sessment rules, 

P(||<MCf)  -  M  <  e  for  all  t'  >  t  |  (t)  =  0. 

To  see  why,  suppose  to  the  contrary  that  for  some  C<  /  this  probability  is  strictly 
positive.  We  proceed  to  derive  a  contradiction. 

From  the  lemma,  we  know  that  on  the  set  A  of  positive  probability  (con- 
ditional on  Q)  where  ||<Pr(Ct')  —  o* ||  <  e  for  all  t'  >  t,  the  limits  inferior  and 
superior  of  the  empirical  frequency  distribution  a-,((<')  almost  surely  lie  within 
(/-l)e  of  a~l .  (If  f^,(C«')(sJ')-<ri(sJ')|  <  c  for  all  ^  ,  then  for  any  s~> '  =  (^)jVl -, 
|  IIjV:  <?3i'((t')(sJ)  -  rij/i  <7«(5J)|  <  (7  -  l)e .)  Since  assessments  are  asymptotically 
empirical,  along  every  infinite  history  in  A  there  is  a  T  such  that  for  all  t'  >  T , 
the  assessments  of  player  i  lie  within  Je  of  cr~2 .  But  then  asymptotic  myopia 
implies  that  along  every  (  G  A,  <f>\,((t)  will  eventually  be  more  than  e  away 
from  a\ ,  which  contradicts  the  definition  of  A .  ■ 

Two  remarks  about  the  proof  are  in  order. 

(1)  Note  that  the  neighborhood  of  a,  that  is  used  in  the  proof  is  independent  of 
the  behavior  and  assessment  rules  that  are  assumed  to  be  given;  the  value  of  e 
depends  only  on  the  strategy  profile  a,  and  the  extent  to  which  it  is  not  a  Nash 
equilibrium. 

(2)  The  full  strength  of  asymptotic  empiricism  is  not  required  for  this  proof.  What 
is  essential  is  that  if  behavior  lies  forever  in  some  small  neighborhood  of  a  strategy,  then 
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assessments  come  to  lie  in  another  small  neighborhood  of  that  strategy.  We  used  the 
strong  law  of  large  numbers  to  show  that  the  empirical  frequencies  would  lie 
in  a  small  neighborhood,  and  then  we  were  able  to  enlist  the  asymptotically 
empirical  character  of  assessment  rules.  But  suppose  the  assessment  rule  took 
the  following  form:  /*)(£)  is  asymptotically  equal  to  the  empirical  frequency  of 
observed  strategy  choices  by  one's  rivals  over  the  most  recent  \/i  periods.  Since 
the  length  of  this  segment  of  history  grows  without  bound,  we  can  still  enlist 
the  strong  law  of  large  numbers  to  come  to  the  desired  conclusion.  Or  suppose 
p)(Qt)  is  a  weighted  average  of  past  observations,  with  greatest  weight  on  the 
most  recent  observation.  As  long  as  that  "greatest  weight"  falls  off  to  zero  fast 
enough  as  t  goes  to  infinity,  we  can  enlist  a  variation  of  the  strong  law  of  large 
numbers  and  get  the  desired  result.  (Of  course,  this  rules  out  exponential  moving 
averages,  where  the  weight  on  the  most  recent  observation  doesn't  vanish  at  all.) 

We  will  stick  to  asymptotic  empiricism  for  the  remainder  of  this  paper,  since 
it  is  expositionally  the  easiest  thing  to  deal  with.  But  you  should  note  that  it  is 
a  bit  more  restrictive  than  we  actually  need. 

Locally  stable  strategy  profiles 

Proposition  6.1  shows  that  every  strategy  profile  that  is  not  a  Nash  equi- 
librium is  unstable.  We  know  by  example  that  there  are  some  strategy  profiles 
that  are  not  unstable.  It  is  natural  to  wonder  whether  any  Nash  equilibria  are 
unstable   The  answer  is  no;  no  Nash  equilibrium  profile  is  unstable. 

To  avoid  double  negatives,  we  make  the  following  definition. 

Definition.  A  strategy  profile  a.  is  locally  stable  if  there  exists  some  asymptotically 
empirical  assessment  rules  and  behavior  rules  that  are  asymptotically  myopic  with  respect 
to  the  assessment  rules  such  that  for  every  e  >  0,  we  can  find  some  t  and  Qt  e  Zt 
such  that 

P(   Urn   <MG')  =  a, |  Ct)  >l-c 


t'  —  oo 
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We  do  not  insist  that  behavior  converges  to  the  target  strategy  with  probability 
one,  but  only  that  the  probability  can  be  made  arbitrarily  close  to  one  (for  a  fixed 
model  of  behavior  and  assessment  rules)  for  some  choice  of  initial  conditions.  We 
couldn't  have  a  probability  one  statement  as  long  as  the  target  strategy  is  not 
pure  (except  for  degenerate  cases),  since  as  long  as  players  use  mixed  strategies, 
there  is  positive  probability  of  a  very  long  run  of  "bad  luck,"  which  would  lead 
players  away  from  the  target  strategy. 

On  the  other  hand,  the  requirement  that  we  be  able  to  make  the  probability 
as  close  to  one  as  we  wish  is  not  as  demanding  as  may  appear.  As  long  as  we 
can  make  the  probability  strictly  positive,  we  know  that  we  can  make  it  as  close 
to  one  as  we  wish. 

Lemma  6.3.  Suppose  that  for  a  strategy  profile  am  there  exists  some  asymptotically 
empirical  assessment  rules  and  some  behavior  rules  that  are  asymptotically  myopic  with 
respect  to  those  rules,  such  that  for  some  t  and  (<  e  Zt , 

P(  Urn  &'(Cf)  =  ff.|  CO  >0. 

t'  —  oc 

Then  a.  is  locally  stable. 

To  prove  the  lemma,  let  A  be  the  event  {linv_00  0<'((,V)  =  c«}  ■  By  the  usual  ar- 
guments, this  is  measurable  with  respect  to  the  a  -field  generated  by  the  {£, ,  (2, . . .} 
Thus  by  Paul  Levy's  zero-or-one  law  (Chung,  1974,  p.341),  the  probability  of  A 
conditional  on  £t«  approaches  the  indicator  function  of  A  as  t'  approaches  in- 
finity. a  Since  A  has  positive  probability  conditional  on  (t ,  for  some  (positive 
probability)  (,<  that  are  continuations  of  (t ,  the  conditional  probability  of  A  can 
be  made  as  close  to  one  as  desired,  which  gives  the  result.  ■ 


If  you  have  trouble  squaring  this  assertion  with  what  you  find  in  Chung  (1974),  recall  that  (,- 
'contains"  C.t   for  t  <  t' ,  so  conditioning  on  f,i    is  the  same  as  conditioning  on   {Ci.  —  i Cf  }  - 
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Proposition  6.4.  Every  Nash  equilibrium  profile  a,   is  locally  stable. 

Proof.  Fix  the  Nash  equilibrium  profile  a, .  By  virtue  of  the  lemma,  we  only 
need  to  find  asymptotically  myopic  behavior  rules  and  asymptotically  empirical 
assessment  rules,  and  a  t  and  (t  such  that 

P(  lim  4>AQv)  =o.\  G)  >0. 

We  provide  two  constructions  that  work,  one  in  detail  and  one  sketched. 
In  the  first  construction,  we  use  assessment  rules  that  (after  the  first  period)  are 
precisely  empirical,  and  we  rely  on  the  asymptotic  part  of  asymptotic  myopia. 
In  the  second,  we  use  behavior  that  is  precisely  myopic  and  rely  on  the  asymp- 
totic part  of  asymptotic  empiricism.  We  discuss  the  relative  merits  of  these  two 
constructions  at  the  end  of  the  proof. 

For  the  first  construction,  create  a  probability  space  on  which  is  defined  a 
sequence  of  random  strategy  profiles  {5i,52,...}  where  each  st  is  independently 
and  identically  distributed  according  to  a. .  By  the  strong  law  of  large  numbers, 
with  probability  one  the  empirical  frequencies  of  strategies  and  joint  strategy 
profiles  all  converge  to  the  corresponding  probabilities  under  a, .  Write  SI  for 
the  support  of  o\ ,  and  let 


e(C<)  =  max 
1=1 ; 


max   u\s\a   *(Cr))  —  mirv    u'is'.a    '(<;<)) 


where  a~'(C<)  is  shorthand  for  the  empirical  frequency  distribution  of  the  s~' 
up  to  time  t  along  the  history  (( .  That  is,  e(G)  is  the  maximum  amount  by 
which  any  s'  in  the  support  of  a\  is  suboptimal  against  a-,(C<)-  Then  with 
probability  one,  e((()  goes  to  zero  as  t  goes  to  infinity  For  m  =  1.2, . . . ,  let  A'„, 
be  a  positive  integer  sufficiently  large  so  that  the  event 

1 

c(Ci)  <  —    for  all  t  >  K, 

m 
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has  probability  at  least  equal  to  (2™+1  -  V,/!™^ .  For  ease  of  exposition,  assume 
that  A'm+]  >  Km  .  For  each  *  =  1,2,...,  let  m(t)  =  0  if  t  <  A", ,  let  m{t)  =  1   if 
A']  <  t  <  A"2 ,  and  so  on.  Note  that  limf—oc  m(0  =  oo . 
We  claim  that  by  construction,  the  event 

e(G)<-rrv  <  =  1,2,.. 

m(i) 

has  probability  at  least  one-half.  To  see  why,  note  that  this  event  can  be  written 
as 

P|    \  e(G)  <  —  f°r  a11  *  such  that  ?n(0  =  m  I, 

and  note  that  the  probability  of  the  events  in  this  intersection  are  one  for  m  =  0, 
3/4  (or  more)  for  m  =  1 ,  7/8  (or  more)  for  ?77  =  2 ,  and  so  on.  Apply  De 
Morgan's  law  to  see  that  the  complement  of  this  event  is  the  union  of  events, 
the  first  of  which  has  probability  0,  the  second  probability  no  more  than  1/4, 
the  third  probability  no  more  than  1/8,  and  so  on.  Hence  the  probability  of  this 
union  is  1/2  at  most,  and  the  probability  of  its  complement  —  the  event  we  are 
interested  in  —  is  at  least  1/2. 

Now  we  define  the  assessment  and  behavior  rules.  As  promised,  the  assess- 
ment rules  are  very  simple.  For  t  -  1 ,  define  ^  arbitrarily,  and  for  k  >  2 ,  let 
fj.\(£t)  =  <i((,t)  ■  That  is,  except  for  the  first  period,  the  players'  assessments  are 
empirical. 

As  for  the  behavior  rules,  for  t  such  that  m{t)  =  0,  let  (f>\  =  a\ .  For  t  such 
that  m(t)  =  m,  let  d',((<)  =  o[  if  e((<)  <  1/m  and  <^j(Ct)  =  any  best  response  by 
i  to  n\((t)  otherwise.  It  it  obvious  that  these  behavior  rule  are  asymptotically 
myopic  (even  strongly  asymptotically  myopic),  relative  to  the  assessment  rules 
given  previously. 

And  for  these  behavior  and  assessment  rules, 

P(lim  Mtt)  =  a*  |  Ci)  >  \- 

t— >oo  '         '  Z 


To  see  this,  note  that  if  e((f)  <  1/m(t')  for  t'  =  l,...,t,  then  <t>t  =  a, .  Thus 
the  probability  of  the  event  {e(0  <  l/m(0,  t  =  1.2,...}  under  the  measure 
induced  by  the  behavior  rules  <j>  is  precisely  the  same  as  the  probability  of  this 
event  under  the  probability  distribution  where  all  strategy  profiles  are  drawn 
independently  and  identically  according  to  the  distribution  a, .  By  construction, 
this  event  has  probability  at  least  1/2  under  the  i.i.d.  generated  measure,  so 
it  has  the  same  probability  (at  least  1/2)  under  the  measure  generated  by  the 
behavior  rules  4> .  And,  of  course, 

{e(Cf)<l/m(0,  *  =  1,2,...}  =  {&(0  =  a.,  <  =  1,2,...}. 

This  completes  the  proof  by  the  first  construction. 

For  the  second  construction,  go  back  to  the  standard  probability  space  on 
which  is  defined  a  sequence  of  random  strategy  profiles  that  are  i.i.d.  according 
to  o, ,  and  let 


5(0=  max  ||&-(0-*r 
That  is,  <5(0  is  the  difference  (in  the  sup  norm)  between  the  empirical  frequencies 
and  the  target  strategy.  From  the  strong  law  of  large  numbers,  lim,_oc,<5(0  =  0 
almost  surely.  So  for  n  =  1,2.. .. ,  let  L„  be  a  positive  integer  sufficiently  large 
so  that  the  event 

£CO  <  -  ror  a11  *  >  Ln 

n 

has  probability  at  least  equal  to  (2n+1  -  l)/2n+1  .  Assume  that  Ln+]  >  L„  .  For 
each  t -.  =  1 , 2, . . . ,  let  n(t)  =  0  if  *  <  L^ ,  let  n(t)  =  1  if  L,  <  t  <  L2 ,  and  so  on. 
Note  that  lim,—,^  n(t)  -  oo  . 

Now  we  give  the  assessment  rules.  For  any  history  d ,  let 

„,  ,      f  ^-',  if  ll*"%)  -  *V\  <  Vn(t),  and 

{ a   '((r),     otherwise. 
That  is,  the  i   believes  that  her  rivals  are  playing  (independently)  according  to 
a~7  unless  and  until  the  accumulated  evidence  against  this  hypothesis  becomes 
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severe  (as  measured  by  \/n{t)),  at  which  point  the  player  reverts  to  empirical 
beliefs.  These  assessment  rules  are  clearly  asymptotically  empirical.  Behavior,  as 
promised,  is  precisely  myopic  behavior;  but  it  still  remains  to  specify  what  each 
player  does  when  more  than  one  strategy  is  optimal.  In  such  cases,  for  ;*,  such 
that  /i)(C<)  =  cf~l ,  the  player  should  use  the  strategy  c\ ;  otherwise,  the  player 
can  make  any  selection  desired  (say,  choose  the  optimal  strategy  of  lowest  index 
for  some  fixed  enumeration  of  S'  )• 

With  these  behavior  and  assessment  rules,  the  players  begin  with  a, ,  and 
the  sequence  {Ln}  has  been  constructed  precisely  so  that,  with  probability  one- 
half  or  more,  the  players  continue  to  use  a.  forever.  The  proof  of  this  is  like 
that  for  the  first  construction.  ■ 

Remarks  on  the  two  constructions.  In  both  of  these  constructions,  players  use  pre- 
cisely the  equilibrium  strategy  for  no  positive  reason  at  all.  The  second  construc- 
tion has  the  comparative  advantage  of  not  relying  on  the  flexibility  provided 
by  asymptotic  myopia,  as  the  mixed  strategy  is  an  optimal  choice  (albeit  one 
of  many)  given  the  beliefs.  However  the  second  construction  does  rely  on  the 
flexibility  afforded  by  asymptotic  empiricism:  For  no  obvious  reason,  players  be- 
lieve that  their  opponents  are  playing  exactly  the  equilibrium  strategy,  and  they 
continue  to  believe  this  unless  and  until  the  evidence  to  the  contrary  becomes 
overwhelming. 

Thus,  although  the  constructions  show  that  convergence  to  a  mixed-strategy 
equilibrium  is  possible,  neither  one  seems  particularly  plausible.  This  concurs 
with  our  intuition,  as  mixed-strategy  equilibria  are  hard  (for  us)  to  defend  in  the 
stark  environment  of  this  chapter.  Instead,  we  tend  to  follow  Harsanyi  (1973)  in 
interpreting  mixed-strategy  equilibria  as  a  shorthand  description  of  pure-strategy 
equilibria  in  games  where  parameters  of  the  game  (such  as  the  players'  payoffs) 
are  subject  to  small  random  perturbations  which  are  private  information.    Sec- 
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tions  7  and  8  consider  learning  in  this  context  and  argue  that  "mixed-strategy" 
outcomes  are  indeed  plausible  when  interpreted  in  this  way. 

Asymptotically  myopic  behavior  and  compatible  histories 

Before  moving  to  this  development,  we  have  on  piece  of  pending  business 
to  take  care  of. 

In  both  Propositions  4.1  and  4.2,  we  assumed  that  the  players'  behavior  rules 
were  strongly  asymptotically  myopic  relative  to  their  assessment  rules,  whereas  in 
Proposition  6.1,  we  assumed  that  behavior  was  (only)  asymptotically  myopic.  We 
earlier  indicated  why  the  results  in  Section  4  would  not  work  with  asymptotically 
myopic  behavior;  viz.,  compatibility  of  a  history  and  a  profile  of  behavior  rules 
is  (too)  weak.  In  order  to  get  results  in  the  spirit  of  Section  4  without  strict 
asymptotic  myopia,  we  must  work  with  the  sort  of  probabilistic  convergnece 
criteria  used  in  this  section. 

Because  we  do  not  assume  On  Propositions  4.1  and  4.2)  that  strategies  con- 
verge, to  avoid  problems  of  correlation  we  must  restrict  attention  to  the  case  of 
two  players.  Also,  because  strategies  are  not  assumed  to  converge,  we  cannot 
use  the  definitions  of  unstable  and  locally  stable  strategy  profiles  given  above. 
Instead,  for  fixed  behavior  and  assessment  rules,  we  make  the  following  defini- 
tion. 

Definition.  A  strategy  profile  a,  €  E  is  xunstable  if  there  exists  some  e  >  0  such 
that  for  all  t  and  G,  P(||a*"(Ci')  -  crl\\  <  e  for  all  t'  >  t  and  i  =  1,2  |  C<)  =  0. 
The  x  in  front  of  xunstable  is  not  a  typo;  this  is  to  distinguish  this  definition  from 
the  definition  of  an  unstable  strategy  profile  given  earlier,  in  which  intended  (as 
opposed  to  empirically  observed)  play  has  probability  zero  of  remaining  in  a 
small  neighborhood  of  a. .  Note  that  this  captures  some  of  the  spirit  of  Proposi- 
tion 4.2,  in  that  it  asks  for  empirical  frequencies  to  remain  close  to  a  target  profile 
a. .  At  the  same  time,  it  is  a  probabilistic  statement  about  the  likelihood  of  this 
event. 
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Proposition  6.5.  For  asymptotically  empirical  assessment  rules  //  and  behavior  rules 
<p'  that  are  asymptotically  myopic  with  respect  to  the  assessment  rules,  every  strategy 
profile  a.  that  is  not  a  Nash  equilibrium  is  xunstable. 

We  omit  the  proof.  The  idea  is  that  if  empirical  (marginal)  frequencies  lie  close 
to  <r» ,  then  so  must  beliefs.  But  if  beliefs  are  close  to  a, ,  and  a,  is  not  a  Nash 
equilibrium,  then  for  some  i  and  some  pure  strategy  s'  in  the  support  of  a\ , 
s1  will  be  used  with  vanishing  probability.  And  then  (by  the  strong  law  of  large 
numbers)  the  frequency  of  s'  must  fall  to  zero,  which  contradicts  the  hypothesis 
that  empirical  frequencies  stay  close  to  a\  (which  puts  positive  weight  on  s1 ). 

7.  Learning  in  games  with  randomly  perturbed  payoffs 

Although  Proposition  6.2  shows  that  all  Nash  equilibria  are  locally  stable, 
including  equilibria  in  mixed  strategies,  we  have  suggested  that  convergence  of 
intended  behavior  to  a  mixed-strategy  profile  in  the  standard  model  seems  im- 
plausible, as  it  requires  that  players  use  just  the  right  mixed  strategy  whenever 
they  are  indifferent,  and  it  is  not  apparent  why  they  should  or  would  choose 
to  do  so.  Of  course,  this  apparent  drawback  of  mixed-strategy  equilibria  is  not 
special  to  our  learning-theoretic  approach,  but  arises  whenever  mixed  strategies 
are  considered.  In  response  to  this  problem,  Harsanyi  (1973)  proposed  that  the 
mixed  -strategy  equilibria  of  a  game  could  be  interpreted  as  pure-strategy  equi- 
libria of  a  related  game  of  incomplete  information,  in  which  each  player's  payoff 
is  randomly  perturbed  by  a  stochastic  shock  which  is  private  information.  For 
example,  a  mixed  strategy  placing  probability  1/3  on  one  pure  strategy  and  2/3 
on  another  corresponds  to  a  situation  in  which  the  payoff  shocks  and  opponents' 
strategies  are  such  that  the  player  has  a  strict  preference  for  the  first  strategy 
when  his  payoff  perturbation  comes  from  a  set  with  probability  1/3  and  strict 
preference  for  the  second  when  his  payoff  perturbation  comes  from  the  com- 
plementary set.  Harsanyi  showed  that  for  generic  strategic-form  payoffs,  every 
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mixed-strategy  equilibrium  can  be  "purified"  by  any  small  and  well-behaved 
payoff  perturbations. 

In  the  spirit  of  Harsanyi's  work,  this  section  extends  our  assumptions  on 
behavior  and  notions  of  convergence  to  games  in  which  the  players'  payoffs 
are  subject  to  an  i.i.d.  sequence  of  payoff  perturbations.  As  we  will  see,  this 
allows  us  to  construct  a  more  satisfactory  model  of  learning  to  play  a  mixed- 
strategy  equilibrium.  The  concluding  section  then  examines  the  question  of  global 
convergence  to  a  mixed  equilibrium  in  2  x  2  games. 

The  model 

Consider  /  players  i  =  1,...  .1  playing  a  strategic- form  game  at  times  t  = 
1,2,...,  where  the  action  spaces  .4'  are  the  same  in  each  period,  but  the  payoffs 
are  subject  to  random  and  privately  observed  shocks.  Specifically,  the  payoff  to 
player  i  from  the  action  profile  a  =  (a',a~')  in  period  t  is  u\(a)  =  vKaj  +  e'tia'). 
This  is  the  augmented  or  perturbed  version  of  the  underlying  game,  which  is  the 
game  where  the  payoff  functions  are  simply  the  u" .  We  call  ej  =  (e^a'))^^'  the 
date-t  perturbation  of  player  i's  payoffs.  We  assume  that  for  each  i  the  {e\;t  = 
1,2,...}  are  independent  and  identically  distributed,  and  that  the  perturbations 
of  different  players  are  independent.  We  denote  the  probability  distribution  of 
each  e\  by  px ,  and  we  denote  its  support,  which  we  suppose  is  compact,  by 
E'  C  RA'  . 

Each  period,  each  player  i  observes  the  shock  c\  to  her  own  payoffs,  but 
does  not  observe  the  shocks  to  her  opponents'  payoff  functions.  Hence  in  the 
stage  game,  a  pure  strategy  for  player  i  is  a  map  from  E'  to  .4' .  (We  will  not 
need  to  consider  mixed  strategies  in  the  augmented  game.) 

To  model  learning  in  the  stage  game,  we  suppose  that  at  date  t  player  i 
knows  the  sequence  of  (pure)  action  profiles  that  have  occurred  in  the  past,  the 
past  shocks  to  her  own  payoffs,  and  also  her  currently  payoff  perturbation  e't; 
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player  ?  does  not  learn  the  past  payoff  shocks  of  her  opponents.  We  adopt  the 
following  notation: 

(a)  We  use  A  =  J~[I=1  A'  to  denote  the  space  of  action  profiles,  with  typical  ele- 
ment a .  Profiles  of  actions  by  all  players  except  i  are  denoted  by  a-'  e  A~l . 
Probability  distributions  over  actions  by  player  i.  are  denoted  by  a2  €  A1 ,  and 
probability  distributions  over  A~'  (which  can  reflect  correlations  in  the  actions 
of  i's  opponents)  are  denoted  by  a~'  £  A~' . 

(b)  Histories  of  actions  up  to  time  t  are  denoted  by  Q  €  Zt ;  i.e.,  (r  =  (a, ,  at-i)  6 

(.4)'"1  =  Z, .  Complete  histories  of  actions  are  denoted  by  (  €  Z  . 

(c)  We  write  at((t)  to  denote  the  empirical  distribution  of  action  profiles  up  to  time 
t  along  the  history  (, ,  and  we  use  a~'(C«)  to  denote  the  empirical  distribution  of 
action  profiles  by  i  's  opponents  up  to  time  t  along  the  history  C,  • 

(d)  In  addition  to  (t ,  at  time  t  player  i  knows  her  own  history  of  payoff  per- 
turbations up  to  and  including  time  t ,  or  (ej , . . . ,  ej) .  We  use  £\  €  X\  to  denote 
the  vector  of  all  this  information;  i.e.,  fj  looks  like  (Ct,  (e{,  •  •  • ,  ep) .  Dropping  the 
subscript  t ,  £'  denotes  a  complete  history  for  player  i  of  action  profiles  by  all 
players  at  all  dates  and  all  of  i  's  payoff  perturbations;  dropping  the  superscript 
i ,  as  in  £*  and  f ,  denotes  (respectively)  a  time  t  and  complete  history  of  plays 
and  all  the  players'  payoff  perturbations. 

(e)  A  behavior  rule  for  player  i  is  denoted  by  <?'  =  (<p\,<t>\ ),  where  <j>)  :  X]  — » 

.4'    24 

(f)  An  assessment  rule  for  player  i  is  denoted  by  /i"  =  (/ij,^,...) ,  where  ^  : 
Zt-*A-\ 

All  of  this  is  a  straightforward  extension  of  our  earlier  model.  In  particular, 
i  's  assessment  rule  gives  her  predictions  how  her  opponents  will  play  at  each 


Measurabiiity  of  the  behavior  rules  is  always  assumed. 
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date,  based  on  history  so  far.  In  this  regard,  note  that  the  domain  of  fi\  is  Z,  and 
not  X\ .  We  are  assuming  that  players  other  than  ?  never  observe  i  's  payoff 
perturbations,  so  it  seems  sensible  that  i  would  not  have  assessments  of  the 
actions  of  her  opponents  depending  on  her  payoff  perturbations.  Nonetheless,  at 
the  cost  of  some  notational  complexity,  we  could  assume  that  i  's  assessments  at 
date  t  depend  on  all  of  £] ,  as  long  as  asymptotic  empiricism  is  properly  defined. 
Given  behavior  rules  for  all  the  players,  and  given  the  exogenous  probability 
distributions  on  payoff  perturbations,  we  can  construct  the  induced  conditional 
probability  distribution,  conditional  on  ft,  on  the  space  X .  We  use  P(-|£«)  to 
denote  this  probability  distribution. 

Definition.  For  augmented  games: 

(a)  The  assessment  rule  fx'  is  asymptotically  empirical  if 

limJ\ri(Q-cx7%)\\  =  0 

for  every  £  e  Z  . 

(b)  The  behavior  rule  <f>'  is  asymptotically  myopic  relative  to  ^  if  for  some  sequence 
of  nonnegative  numbers  {e,}  converging  to  zero,  i  's  choice  of  action  at  every  E,\  is  at 
most  et  suboptimal  against  /^(C*)-2526 

Nash  equilibria  of  the  augmented  game 

Before  examining  learning  in  the  context  of  this  model,  we  review  the  struc- 
ture of  Nash  equilibria  in  the  augmented  (stage)  game. 

A  Nash  equilibrium  of  the  augmented  game  is,  as  usual,  a  strategy  profile 
such  that  each  player's  chosen  strategy  s'(-)  (:£'—»  A')  maximizes  her  expected 


Suboptimaliry  here  is  measured  given  the  period  t  payoff  perturbation.  We  believe  that  nothing 
of  interest  changes  with  a  weaker  definition  in  which  suboptimality  is  measured  averaging  over  e',  , 
but  the  proofs  are  somewhat  more  involved. 

Throughout,  we  are  loose  in  our  notation,  taking  as  understood  things  such  as:  in  (a),  C< 
denotes  the  date  I   subhistory  of  the  fixed   (  ;  and  in  (b),  (,   is  the  actions-profile  part  of  £|  . 
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payoff  given  the  strategies  of  her  opponents,  or  equivalently  that  for  p'  -almost 
every   e1 ,  a'  =  s'(e')  maximizes  the  expectation  (over  c~')  of  v,(a,,s~l(c~t))  + 

e'(o!). 

Assumption  7.1.  For  each  i ,  the  distribution  pl  is  absolutely  continuous  with  respect 
to  Lebesgue  measure  on  RA'  . 

This  assumption  simplifies  the  analysis,  because  it  implies  that  for  any  distribu- 
tion of  the  opponents'  actions,  i  has  a  strict  preference  for  one  of  her  actions  at 
p'  -almost  every  e' . 

Lemma  7.2.  For  every  o-'  t  A~'  ,  the  set  of  c'  for  which 

argmaxai£A,        £      [t'(a',a-')  +  e'V^a-'Co-') 

is  a  singleton  has  measure  one  under  p7  . 

We  omit  the  proof,  which  is  based  on  the  observation  that  the  complement  of 
this  set  lies  in  a  finite  union  of  lower-dimensional  hyperplanes.  Note  that  this  is 
true  whether  a-1  reflects  independent  or  correlated  play  by  i  's  rivals. 

For  each  e'  and  a-',  let  b'(e\a~')  specify  some  best  response  for  i  to  a~l 
when  her  payoff  perturbation  is  e' ,  and  let  31(a~'t)  be  the  distribution  that  b7 
induces  on  player  i  's  actions: 

^(q-'Xq')  =  p'{tJ  e  E1  :  &V,a-')  =  c'}. 

(Lemma  7.2  shows  that  81  is  well  defined,  since  for  every  o~' ,  6!  is  uniquely 
determined  for  px  -almost  every  e' .) 

It  is  straightforward  to  prove  the  following  technical  result. 

Lemma  73.  The  function  37   is  continuous. 

For  each  i  and  strategy  profile  5' ,  let  t'(s')  denote  the  distribution  on  .4' 
induced  by   s* ;  i.e.,   7T'(s')(al)  =  p'{e'  €  El  :  sl(e')  =  a1}.   For   s~'   a  profile  of 
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strategies  for  ?'s  opponents  —  that  is,  s~*  =  (■s-7)^,  where  s3  :  E3  — ♦  A3  —  let 
tt~'(s~')  denote  the  distribution  on  A~x  induced  if  i's  rivals  use  the  strategies 
in  s-1 .  Note  that  7r_,(5~')  €  A~'  is  a  product  measure  (since  the  various  e3  are 
independent  of  each  other). 

Lemma  7.4.  (a)  A  strategy  profile  s  is  a  Nash  equilibrium  of  the  stage  game  if  and  only 
if,  for  each  player  i  and  p* -almost  all  e' ,  s'ie1)  =  b,(e\7r~1(s~')). 

(b)  If  (a1,..., a1)  €  A*  x  ...  x  A1  satisfies  /?'(a~')  =  a'  for  all  i  (where  it  is 
understood  that  a-'  is  the  product  measure  in  A~'  whose  margins  are  the  various  a3 
for  j  f  i),  then  every  strategy  profile  s  such  that  sx{el)  =  a'(e\a-')  for  all  i  and 
p1  -almost  every  e'  is  a  Nash  equilibrium. 

This  is  largely  a  matter  of  marshalling  definitions,  hence  the  proof  is  omit- 
ted. This  lemma  shows  that  to  analyze  Nash  equilibria  it  suffices  to  work  with 
the  induced  marginal  distributions  over  actions,  which  motivates  the  following 
definition. 

Definition.  The  vector  of  marginal  distributions  a  =  (a1,...,  a*)  €  A*  x  . . .  x  A1  is 
a  Nash  distribution  if  /?'(a-')  =  a"  for  all  i . 

Local  stability 

As  one  would  expect,  our  results  about  the  relationship  between  Nash  equi- 
librium and  local  stability  carry  over  to  the  context  of  augmented  games.  Since 
all  that  each  player  observes  about  the  others,  and  all  that  matters  for  a  player's 
decisions,  are  the  actions  chosen,  we  define  stability  and  stability  of  beha  -ior 
rules  <f>  in  terms  of  the  induced  distributions  on  actions  ^'(^(fi)). 

Definition.  A  profile  a,  6  A1  x  . . .  x  A1  is  unstable  if  there  exists  some  e  >  0  such 
that  for  all  t  and  £r , 

P(lkW(fJ'))-ai||  <  t  for  all  t'  >  t  and  i   £f  J  =0. 
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Proposition  75.  Fix  asymptotically  empirical  assessment  rules  pl  and  behavior  rules 
that  are  asymptotically  myopic  relative  to  the  p' .  Then  if  a,  is  not  a  Nash  distribution, 
a.  is  unstable. 

The  proof  uses  the  following  lemma,  which  shows  that  as  time  passes,  asymp- 
totically myopic  behavior  converges  to  myopic  behavior,  in  the  sense  that  the 
induced  distributions  over  outcomes  become  close. 

Lemma  7.6.  For  any  6  >  0  there  exists  an  e  >  0  such  that,  for  any  beliefs  a-1  and 
for  any  s'  that  e  -maximizes  player  i 's  payoff  against  a-'  for  p'  -almost  every  c1 , 

Htt'V)  -  /?'(a-')|!  <  L 


Proof.  We  will  show  that  for  any  6  there  exists  an  e  such  that  for  all  a~' ,  the 
set  of  e'  for  which  player  i  has  more  than  one  e-best  response  has  measure 
(under  pl )  no  greater  than  6 .  To  see  this,  fix  some  a-'  and  note  that  the  set 
of  e'  that  make  player  i  indifferent  between  any  two  given  actions  lies  on  a 
lower-dimensional  hyperplane,  and  there  are  (at  most)  (#.4*)2  such  hyperplanes, 
where  #.4'  denotes  the  cardinality  of  .4' .  Put  a  "sleeve"  of  diameter  e  around 
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each  of  these  hyperplanes,  and  the  set  of  e!  where  player  i  has  multiple  t  -best 
responses  is  contained  in  the  union  of  these  sleeves.  In  the  compact  set  E' ,  there 
is  a  uniform  (in  a1  upper  bound  on  the  Lebesgue  measure  of  the  union  of  (#.4')2 
sleeves  of  this  sort,  and  this  uniform  upper  bound  goes  to  zero  as  e  goes  to  zero. 
Because  px  is  absolutely  continuous  with  respect  to  Lesbesgue  measure,  there  is 
thus  a  uniform  upper  bound  on  the  ol  -measure  of  these  sets,  going  to  zero  as  e 
goes  to  zero,  which  establishes  the  result.  ■ 

Proof  of  Proposition  7 .5 .  Fix  an  a.  that  is  not  a  Nash  distribution.  Without  loss 
of  generality,  suppose  that   ^(oj1)  f  a\  .   Let  8  =  ^'(q:1)  -  a\\\.   Since   l3]    is 


continuous,  we  can  find  e'  sufficiently  small  that  \\0l(a  l)  -  /?1(ai,)||  <  6/4  for 
all   a-1   within  It'  of  a^1  . 

We  will  show  that  profile  q»  is  unstable  for  e  =  min(e'/7, 6/4).  Suppose  not, 
so  that  for  some  history  f< , 

P(lkW(£-))-a|||  <  e  for  alii'  >  t  and  for  2  =  1,...,/    &  J  >  0. 

Lemma  6.2  then  implies  that  (almost  surely  on  this  event  of  positive  probability) 
the  empirical  marginal  frequencies  of  actions  eventually  lie  within  e  of  the  a'„ . 
Then  because  assessments  are  asymptotically  empirical,  we  conclude  that  (almost 
surely  on  this  event),  ||/zj.(Cr)  -  a7l\\  <  It  <  e'  ■  This  *n  turn  implies  that  the 
distribution  on  actions  /^(/^.(GO)  induced  by  the  myopic  best  response  to  /j],((t') 
is  within  6/4  of  /^(a.-1). 

From  Lemma  7.6,  there  is  one  e"  such  that  the  set  of  e"  -best  responses  to 
/iJ-(G')  is  within  6/4  of  ^(/^. ((/'))  for  any  (,- .  Let  t'  be  large  enough  that 
the  suboptimization  allowed  for  by  player  l's  decision  rule  is  less  than  this  e"  . 
The  triangle  inequality  then  implies  that  the  marginal  distribution  over  player 
l's  actions  is  within  6/1  of  ^(q^1)  ,  and  hence  at  least  6/1  >  e  away  from  a\  , 
which  contracts  the  hypothesis.  ■ 

The  next  step  in  parallel  with  Section  6  is  to  note  that  every  Nash  equilibrium 
is  locally  stable  for  some  asymptotically  empirical  assessments  and  asymptotically 
myopic  behavior  rules.  This  can  be  most  easily  shown  by  adapting  the  second 
construction  in  the  proof  of  Proposition  6.2,  in  which  players  believe  that  the 
distribution  over  their  rivals'  actions  corresponds  to  the  equilibrium  unless  and 
until  they  receive  sufficient  evidence  otherwise.  With  these  beliefs,  the  arbitrary 
nature  of  the  players'  behavior  rules  is  eliminated;  rather  than  just  happening  to 
mix  in  the  way  the  equilibrium  prescribes,  the  players  have  a  strict  preference 
for  the  behavior  they  choose.  Of  course,  the  players'  beliefs  are  still  cooked  to 
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favor  the  equilibrium,  so  we  do  not  yet  have  a  really  satisfactory  explanation  of 
how  players  might  learn  to  play  a  mixed  equilibrium.  The  final  section  provides 
such  an  explanation  for  a  special  class  of  two-player  games. 

S.  Global  convergence  in  a  class  of  2  x  2  games 

This  section  will  show  by  example  how  learning  in  augmented  games  can 
lead  to  a  mixed  equilibrium  even  when  the  assumed  behavior  and  assessment 
rules  do  not  build  in  an  arbitrary  predilection  for  equilibrium  play.  To  this  end, 
we  will  restict  attention  to  behavior  and  assessments  that  take  precisely  the  form 
of  fictitious  play,  as  specified  in  (A)  through  (D)  of  Section  3.  Moreover,  we 
will  do  more  than  show  that  convergence  to  a  mixed  equilibrium  can  occur 
even  when  the  equilibrium  is  not  artificially  built  in  the  behavior  rules:  In  the 
games  we  consider,  play  will  converge  to  the  (augmented  version)  of  the  mixed 
equilibrium  with  probability  one,  regardless  of  the  initial  beliefs  of  the  players. 

We  do  not  aim  for  very  general  results  here.  Rather,  we  content  ourselves 
with  the  special  case  of  2  x  2  games  that  (before  being  augmented)  have  a  unique 
Nash  equilibrium,  which  moreover  is  completely  mixed.  At  the  end  of  the  section, 
we  speculate  about  possible  extensions  that  would  provide  a  sufficient  condition 
for  local  stability  of  mixed  equilibria  under  fictitious  play  in  other  augmented 
games.  We  suspect,  however,  that  convergence  cannot  be  guaranteed  for  gen- 
eral augmented  games;  we  conjecture  that  an  augmented  version  of  Shapley's 
example  will  provide  the  desired  counterexample,  but  we  have  not  verified  this. 

The  remainder  of  this  section  will  discuss  and  prove  the  following  result. 

Proposition  8.1.  Take  any  2x2  game  that  has  a  unique,  completely  mixed  Nash 
equilibrium,  and  consider  any  augmentation  that  satisfies  Assumption  8.3  given  below. 
If  behavior  rules  and  assessments  are  as  in  the  model  of  fictitious  play,  the  induced 
marginal  distributions  on  actions  converge,  with  probability  one,  to  the  unique  Nash 
distributions  of  the  augmented  game. 
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Previous  results  about  the  global  convergence  of  behavior  in  learning  pro- 
cesses have  focussed  on  games  that  are  solvable  by  iterated  strict  dominance 
(Moulin,  1984;  Guesnerie,  1991;  Milgrom  and  Roberts,  1990;  Borgers  and  Janssen, 
1991).  In  contrast,  the  augmented  games  we  consider  are  not  dominance  solvable. 

Preliminaries 

Fix  a  2  x  2  augmented  game  with  expected  payoffs  (v\v2)  and  payoff 
perturbation  vectors  e1  and  e2 .  Write  the  action  sets  for  each  player  A  =  {1,2) , 
so  that,  for  example,  r2(l,2)  is  2's  playoff  if  he  chooses  column  2  and  player  1 
chooses  row  1.  (Player  l's  choice  of  row  is  listed  first.)  Assume  that  the  game 
(v\v2)  has  a  unique  Nash  equilibrium  that  is  completely  mixed. 

Because  (r\r2)  has  a  unique  Nash  equilibrium  that  is  completely  mixed, 
(v\v2)  has  a  strict  best-response  cycle.  Rearrange  rows,  if  necessary,  so  that  this 
best-response  cycle  is  counter-clockwise.  That  is,  i^O.l)  <  ?'1(2.1),  r2(2.1)  < 
i<2(2.2),  r](2,2)  <  i.'1  (1,2),  and  y2(l,2)  <  u20,l). 

Let  FHz)  be  the  probability  that,  on  any  given  date,  (e1(2)-f1(l))/(r1(1.2)- 
t,1(2,2))  <  z  .  This  probability  is  derived  from  the  distribution  function  p1  in  the 
obvious  fashion.  Note  that  F1  is  continuous  on  R1  . 

Let  F2(z)  be  the  probability  that,  on  any  given  date,  (e2(l)-e2(2))/(i-2(2.2)- 
u2(2,l))  <  2  .  Note  that  F2  is  continuous. 

We  want  to  compute   /3l(c2) ,  the  marginal  probability  that  1  plays  row  1 

if  she  assesses  probability   a2   that  2  plays  column  1.    If  si.e  plays  row  1,  her 

expected  payoff  is   aV(l,l)  +  (1  -  a2)vy(l,2)  +  eHD,  while  row  2  nets  for  her 

aV(2.1)  +  (l  -  a2)u1(2,2)  +  e1(2) .  Simple  algebra  shows  that  the  former  is  greater 

if 

e1(2)-e1(D        _         2A,   .   1^(2,1) -1^(1.1) 


<l-aMl  + 


Define 


u,(l,2)-vK2,2)  V       f](l,2)-t.'H2.2) 

r1(2.1)-r,(l,D 


x  =  1  + 


i'1  (1.2)  -i-H2.2) 
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Then  row  1  is  chosen  whenever  (e'(2)  -  c'(D)/(t'1(1.2)-  d1(2,2))  <  1  -xa2.  This 
gives 

/31(q2)  =  F,(1  -xq2). 

Note  that  x  >  1  and  (thus)  that  /?'  is  a  nonincreasing  function  of  a2 . 
A  similar  computation  will  show  that  if  we  define 

f2(l,l)-v2(l,2) 

V         +r2(2.2)-u2(2,l)' 

then 

/32(Q1)  =  l-F2(l-ya1). 

Note  that  y  >  1   and  (thus)  that  /?2  is  a  nondecreasing  function  of  a1  . 
Lemma  S.2.  Fix  any  2x2  game  iw'f/i  a  unique  Nash  equilibrium  that  is  strictly  mixed. 
Then  every  augmented  version  of  the  game  that  satisfies  Assumption  7.1  has  unique 
Nash  distributions,  and  thus  has  Nash  equilibrium  strategies  that  are  essentially  unique. 

Proof.  Nash  distributions  are  pairs  (a\,a2J  where  /S'(q~')  =  al  for  i  =  1,2. 
To  show  existence  of  a  solution  to  these  two  equations,  note  that  (a^,a2)  >-* 
(/51(q2),/52(q1))  is  a  continuous  function  mapping  the  unit  square  into  itself,  and 
use  Brouwer's  fixed  point  theorem.  To  show  uniqueness,  suppose  that  (a\,al) 
and  (a\,a2m)  are  two  Nash  distributions.  Without  loss  of  generality,  assume 
a\  f  a\  and,  in  fact,  a\  >  a\  .  Because  01  is  nondecreasing,  this  implies  that 
a2  >  a2  .  And  because  Z?1  is  nonincreasing,  this  implies  that  a\  <  a. ,  a  contra- 
diction. ■ 

We  hereafter  denote  the  probabilities  of  row  1  and  column  1  in  the  unique 
Nash  distributions  by  a\  and  a2  .  In  general,  it  is  not  the  case  that  a\  and  a2 
are  both  strictly  between  zero  and  one,  even  if  the  original  (unaugmented)  game 
has  a  unique,  completely  mixed  equilibrium.  However  the  Nash  distribution 
probabilities  are  strictly  between  zero  and  one  if  the  supports  of  the  perturbations 
are  sufficiently  small. 
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Assumption  83.  For  i  =  1,2,  the  density  function  of  F'  is  uniformly  bounded  on 
its  support.  The  density  function  for  F1  is  strictly  positive  on  some  neighborhood  of 
1  —  xa\  ,  and  the  density  function  for  F2   is  strictly  positive  on  some  neighborhood  of 

This  assumption  is  stated  in  somewhat  implicit  form,  since  it  involves  the 
density  functions  of  the  distribution  functions  F1  and  F2 .  Restating  it  in  terms 
of  the  original  distribution  functions  p1  and  p2  of  the  perturbation  vectors  is 
tedious  but  not  difficult.  A  set  of  sufficient  conditions  for  this  assumption  is  that 
the  supports  of  the  perturbation  vectors  F1  and  F2  are  connected  and  small 
enough  that  the  equilibrium  marginal  distributions  are  strictly  between  zero  and 
one,  and  the  density  functions  for  p1  and  p2  are  bounded  and  strictly  positive 
on  the  interior  of  their  supports. 

The  intuition  for  the  proof  of  Proposition  8.1 

The  proof  of  Proposition  8.1  is  fairly  involved,  and  it  is  easy  to  get  lost  in  the 
details.  So  before  giving  those  details,  we  sketch  the  intuition  behind  the  proof. 

We  fix  behavior  and  assessment  rules  where  behavior  is  myopic  with  respect 
to  the  assessments  and  the  assessment  rules  conform  precisely  to  the  model  of 
fictitious  play,  as  given  in  Section  3.  We  will  write  a\  for  the  probability  assessed 
by  —i  at  date  t  that  player  i  will  play  her  first  action.  This  is  a  random 
variable,  depending  on  the  history  of  play  up  to  date  t .  We  will  show  that 
limi_oo(aJ,Q-j)  =  (q^q2)  with  probability  one;  since  behavior  is  myopic,  this 
implies  that  behavior  converges  to  the  Nash  equilibrium  strategies. 

For  notational  simplicity,  we  suppose  that  the  players'  assessments  equal  the 
empirical  distributions  at  all  dates  t  >  2 ,  which  corresponds  to  the  case  of  initial 
weights  identically  equal  to  zero  in  the  fictitious  play  model.  It  will  be  clear  that 
allowing  for  nonzero  initial  weight  does  not  alter  the  analysis.  In  this  case 
(ta't  +  l)/(t  +  1),     if  i  plays  action  1  in  round  t,  and 


qm  = 


ta\/(t  +  1),  if  z  plays  action  2  in  round  t, 
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which  is  more  conveniently  written  as 

(1  —  a\)/(f  +  1),     if  i  plays  action  1  in  round  /,  and 
—a\/(i  +  1).  if  7  plays  action  2  in  round  1. 

The  key  thing  to  note  here  is  that  the  size  of  the  changes  of  the   o'   vanishes 
asymptotically,  at  rate  0(1/1) . 

The  theory  of  stochastic  approximation  (Arthur  et  al.,  1987;  Kushner  and 
Clark,  1978;  Lyung  and  Soderstrom,  1983)  shows  when  such  asymptotically  van- 
ishing stochastic  variations  can  be  ignored;  i.e.,  when  the  successsive  random 
variables  will  almost  surely  evolve  according  to  the  evolution  of  their  expected 
values.  Starting  from  some  value  of  (a], a]) ,  compute  the  (conditional)  expected 
values  (q1^,  ,  a2uz) ,  then  use  these  to  compute  expected  values  of  (a)+7.  a2u2) ,  and 
so  on.  If  for  every  starting  value  of  (a], a]),  this  "successive  expected  values" 
sequence  converges  to  (a\<al) ,  and  if  certain  regularity  conditions  are  met,  then 
the  random  process  will  almost  surely  approach  (Q»,a^) . 

So  fix  some  (a] ,  a2,)  and  compute  the  conditional  expected  values  of  (a]+: ,  a2t+] ) 
Or,  since  it  makes  matters  a  bit  more  transparent,  let  us  compute  E[oJ+]  —  aj|£,] 
for  2  =  1,2,  where  E[-|£(]  denotes  expectation  taken  with  respect  to  P(|£<) . 

Given  a\ ,  the  probability  that  player  2  will  play  his  first  strategy  is  i32(a))  = 
1  -  F2(l  -  ya\) ,  so  the  expectation  of  the  difference  between  a2M  and  a]  is 


£f(l-«1  -»<.!)) +  £l*l-va;). 


This  simplifies  to 


E[a^+]  -  a]\it]  =  ^(1  -  F*(l  -  yo))  -  a2). 


A  similar  calculation  gives 

1 


E[a]+,  -  a\\Zt]  =  —(F\l  -  xa])  -  a\). 
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The  fact  that  the  step  size  in  these  difference  equations  is  going  to  zero  suggests 
that  the  evolution  of  the  "successive  expected  values"  approximates  that  in  the 
related  differential  equation  system 

^  =  F'(l  -*a>)  -a],  and  ^  =  1  -  J*(l  -  ya\)  -  a?, 

where  we  reinterpret  the  a1  s  as  the  successive  expected  values,  and  the  time 
index  has  been  changed:  Because  the  amount  of  change  between  time  t  and 
t  +  1  in  the  differential  equations  is  independent  of  t,  more  and  more  steps 
of  the  difference  equations  are  compressed  per  unit  of  time  of  the  differential 
equation  system  as  t  gets  larger. 

The  trajectories  of  this  system  of  differential  equations  are  most  easily  studied 
by  comparing  them  with  those  of  the  system 

^  =  F'(l  -  xa\)  -  ai,  and  ^  =  1  -  F2(l  -  ya\)  -  a2. 

We  show  below  that  the  second  system  has  closed,  convex  orbits  around  the  point 
(a\,al),  and  that  relative  to  the  second  system,  the  first  always  points  strictly 
inward.  (See  Figure  8-1.)  Thus  the  first  system  spirals  in  towards  (q».q»)  .  This 
suggests  that  the  "successive  expected  value"  sequence  approaches  (a\,a2,)  from 
any  starting  point,  and  then  the  methods  of  the  theory  of  stochastic  approximation 
will  yield  the  almost  sure  convergence  that  we  desire. 

In  relating  this  intuition  to  the  proof  we  now  give,  there  are  two  things  to 
watch  for.  First,  we  will  use  the  closed  orbit  trajectories  of  the  second  system 
of  differential  equations  as  level  curves  for  a  Lyapunov  function.  Second,  we 
derive  parts  of  the  theory  of  stochastic  approximation  that  we  need,  because 
the  Lyapunov  function  we  construct  is  a  bit  less  regular  than  is  required  for  the 
general  results  as  they  are  stated  in  the  literature. 27 


Specifically,  our  Lyapunov  function  is  not  twice  continuously  differentiable  in  general. 
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a' 


Fie.  8-1.  Dynamics  of  expected  beliefs. 
The  solid  curves  represent  trajectories  of  the  second  system  of  differential  equations  given 
in  the  text.  These  closed  orbits  give  the  level  sets  of  the  Lyapunov  function  L  .  (Note  that 
these  level  sets  are  not  restricted  to  stay  inside  the  unit  square.)  The  dashed  arrows  show 
the  trajectories  of  the  first  system  of  differential  equations,  the  system  that  describes  the 
dynamics  of  "expected  beliefs."  (These  do  stay  within  the  unit  square.)  Since  the  dashed 
arrows  always  point  inwards  relative  to  the  closed  orbits  of  the  second  system,  the  first 
system  gives  trajectories  which  spiral  in  towards  (a\,a\) . 

Proof  of  Proposition  8.1 

For  (a1,  a2)  €  [0,1]  x  [0,1],  let  L(a\a2)  be  the  function  defined  by 

L(a\a2)=  /      [l-F2{l-yF)-al]dF-  /      [F1  (1  -  x/?2)  -  a\}d02. 

The  function  L  (a  mnemonic  for  Lyapunov)  has  the  following  properties: 
(a).  L(a\,a2.)  =  0. 

(b)  If  (a^o^CaiXhthen  L(a\a2)>0. 

(c)  L  is  continuously  differentiable  with  gradient  vector 

VI  =  (1-F2(1  -ya^-Q^Q1. -F](l  -xa2)). 
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This  gradient  vector  is  zero  at  (q'.,q»)  and  nonzero  everywhere  else. 
(d)  L  is  convex.  In  fact,  the  level  curves  of  L  are  trajectories  of 

^  =  J«(l  -  xal)  -  a\  and  ^  =  1  -  F2(l  -  yaj)  -  a2. 

which  gives  closed  trajectories  that  wind  around  the  point  (a\ .a2) . 

Property  (a)  is  immediate  from  the  definition  of  L .  For  property  (b),  note  first 
that  1  —  F^O  -  ya\)  =  a2,  and  F]0  -  xa\)  =  a\  .  Then  enlist  Assumption 
8.3  to  note  that  1  -  F2(l  —  ya1)  is  nondecreasing,  and  it  is  strictly  increasing 
in  a  neighborhood  of  a\ ,  and  F](l  —  xa2)  is  nonincreasing,  and  it  is  strictly 
decreasing  in  a  neighborhood  of  a2  .  Property  (b)  then  follows.  Property  (c)  is 
virtually  a  matter  of  definitions,  together  with  the  properties  of  1  -  F20  —  ya^) 
and  F](l  -  xa2)  just  noted.  For  part  (d),  compute  the  Hessian  of  L  to  show 
convexity;  the  rest  is  an  exercise  in  integration. 

Note  that  L  is  separable;  i.e.,  L(a\a2)  =  Via1)  -  L2(a2)  where  we  define 
L\*>)  =   /       [1  -  F2(l  -  y/?1)  -  a2]d/?\  and 

Jol 


r2 

L2(a2)  =  /      [F'{1-x/37)-a\]d!32. 

Jal 


Now  fix  assessment  rules  according  to  the  model  of  fictitious  play,  and  sup- 
pose that  behavior  is  precisely  myopic  with  respect  to  these  assessments. 

For  every  (' ,  t  >  1 ,  define 

L«(C«)  =  L{a\,a\)  =  Via))  +  L2(a2t). 

In  words,  Lt(Q)  is  the  value  of  the  function  I  at  the  vector  of  assessments  by  the 
two  players.  We  will  have  proved  the  theorem  if  we  prove  that  lim,_oo  1,(0)  =  0 
with  probability  one,  since  this  will  imply  that  the  beliefs  are  converging  (with 

5G 


probability  one)  to  a\  and  a\  ,  hence  (by  earlier  analysis  and  myopic  behavior) 
the  marginal  distributions  over  actions  are  converging  to  those  values. 

The  first  step  in  proving  that  lim,  Lt((t)  — *  0  is  to  derive  an  upper  bound  for 
E[Z,+]((<+i)  -  Li(Ct)|&]-  Specifically,  we  will  show  that  there  exists  a  nonpositive 
continuous  function  l  on  the  unit  square  with  i(a'1,oc2)  <  0  for  (a'1^2)^  (a\,al) , 
such  that  if  A'  is  the  uniform  bound  on  the  density  of  the  two  perturbation 
scalars, 

E[Ll+1(C,+i)  -  X-f(Ci>KiJ  <     t  +  1     +  2{t  ~  1)2  ■  (+) 

We  obtain  this  bound  by  looking  at  the  two  (separable)  pieces  of  Lt(Q) . 
That  is,  we  write 

E[LM(CM)-Lt(Q\it]=-E[V{a'M)-V{a\)  |  fc]  -  E[L2(a2M)  -  L2(a])  \  ((j , 


and  we  begin  with  estimates  of  each  of  the  two  expectations  on  the  right-hand 
side. 

Consider  first  the  term  involving  V  .  Given  a\ ,  we  have  that  q'(+]  will 
equal  (ta]  +  1)/(t  +  1)  if  player  1  plays  row  1  in  round  t ,  and  q]+]  will  equal 
ta)/(t  +  V)  if  player  1  plays  row  2.  Since  l's  beliefs  about  2's  actions  are  given  by 
a2, ,  player  1  will  play  row  1  in  round  t  with  probability  F1  (l  —  iq')  .  Now 

rdoj  +D/U+1) 


which  is  bounded  above  by 
'ta)  +  1 


[l-F2(l-yp)-ai}df3\ 


t  +  l     ~a< 


l-F*(l-ya))-a] 


2     y 


ta)  +  1 

t  +  1 


a, 


where  K  is  the  uniform  bound  on  the  density  functions  of  F1  and  F2 . 28   Sim- 


28 


This  is  the  length  of  the  interval  of  integration,  times  the  value  of  the  integrand  at  31  =  a)  , 
plus  a  bound  on  the  integral  of  the  difference  between  the  true  integrand  and  the  integrand  at  a1,  . 
This  latter  bound  comes  from  noting  that  P  •—  F2(l  -  y/3)  is  Lipschitz  continuous  with  Lipschitz 
constant  Ky  . 
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plifving,  this  is 


Similarly, 


1-°1 

*  +  l 


l-f^l-yaj)  -a2. 


2     y 


l/    tQ'     \  T-ifS 


-J-   I  a.      s 


-a 


<  +  1 


+  xA'y 


*  +  l 


Hence  E  [X1  (a1,.,., )  -  -^(a1,)  |  fr]   is  bounded  above  by 


1-F2- 


a: 


t  +  1 


0  -  al)F'  -  qJO  -  F1) 


1  -  F2  -  a2 


F1  -  a! 


F1) 

Kxj 

'  2(y  +  l)2 
'2/ 

2(f  +  l)2' 


where  we  suppress  the  arguments  (respectively  1— :m2  and  1—  yaj )  of  F1  and 
F2. 

Similar  calculations  show  that  E[X2(q2+1)  -  I2 (a2)  |  6]   is  bounded  below 

by 


F1  -q 
t  +  1 


1  -  F2  -  a2 


A' 1 


2(*  +  l)2 


Thus 


E[WC,+i)-L,(Ct)  I  &]  =  E[L(q|+1.g^)  -1(a),  a2)  |  6] 

=  E[L1(al+1)-I1(a])  I  6]  -E[L2(a2+1)-L2(a2)  I  G 
is  bounded  above  by 


1  -  F2  -  Q 


2 1 


F1  -  a) 


1      ,0 


F1  -a 

t  +  1 


1  -  F2  -  a2 


Jy'Gt  +  y) 
2(*  +  l)2 


Write  [F'-a1,]  as  [F'-ai+ai-Q;]  ard  write  [1-F2-q2]  as  [l-F2-a2,+a2m-a2t] , 
substitute  these  two  expressions  into  the  upper  bound,  and  simplify.  This  gives 

E[X.m(Cw)-Li(Ci)  I  it]  < 


t-f2-q: 
t  +  i 


F1  -q1. 
M  1 


a.  -  a, 


A'(x  +  y) 
2(i  +  D2 
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Define 

,(a\a7)  =  [l  -  F2(l  -  J/Q1)  -  a;]  [a\  -  a1]  -  [F(l  -  xo2)  -  a\)  [a2  -  a2}. 

The  signs  of  1  -  F2  -  a2  and  q1.  -  q1  are  opposite,  and  the  signs  of  F1  -a\  and 
a2  —  a2  are  the  same,  so  that  i  is  a  nonpositive  function.  Moreover,  if  a1  ^  a1, , 
then  1  -  F2  -  a2  ±  0,  and  if  o2  ^  a2  ,  the^  F1  -  a\  f  0,  so  i(q\q2)  <  0  for 
(a1 .  a2)  ^  (a1.,  a2) .  Putting  everything  together,  this  gives  the  bound   (*) . 

Next,  for  each  /  and  (r ,  let 

I'ACt)  =  E[I,(C«)  -  Lf_i(C»-i)  |  6-i]. 

and 

LtiO  =  LAG)  -  ^max{^(G-),0}. 

By  construction,  {!■,((,)}  forms  a  supermartingale  over  the  information  sequence 
{0}  •  By  the  bound  (*), 

so  {!■<}  is  a  bounded  supermartingale,  and  hence  has  a  limit  almost  surely.  As 
an  immediate  consequence,  Lt((t)  itself  has  a  limit  almost  surely. 

Finally,  it  is  not  possible  that,  with  positive  probability,  this  limit  is  a  value 
greater  than  zero.  To  see  this,  suppose  that  along  some  history  ( ,  lim,—^  L,(Q)  > 
0 .  From  the  construction  of  the  function  i  given  previously,  it  is  easy  to  show 
that,  in  this  case,  lim  sup^^a^a2)  <  0,  so  that  for  all  t  >  T  for  some  large 
T ,  da), a])  <  f  <  0.  Increase  T  if  necessary  so  that  T  >  A'(x  +  y)/|f  |.  Then  it  is 
a  matter  of  algebra  to  show  that  ri»,(C»)  <  i/(2(t  + 1))  for  all  t  >  T .  Accordingly, 
if  we  define  Lt(Ct)  as  L,(Q  -  ]T',=1  ibviQv) ,  we  know  that  lim,— oo  -WCt)  =  °° 
for  those  (  where  Lt((t)  has  nonzero  limit. 
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But  {Lt(Ct)}  is  a  martingale  that  is  bounded  below.  If  L^d)  has  nonzero 
limit  with  positive  probability,  L/(C<)  has  limit  oc  with  positive  probability.  In 
this  case,  lim,^ooE[Z,((<)]  =  oc,  which  contradicts  the  fact  that  {£,((,"<)}  is  a 
martingale.  ■ 

Extensions  of  Proposition  8.1 

Proposition  8.1  gives  a  relatively  restricted  global  convergence  result.  It  is 
restricted  in  that  the  behavior  and  assessment  rules  that  are  permitted  are  quite 
specific;  behavior  must  be  precisely  myopic  with  respect  to  assessments  that  are 
formed  according  to  the  model  of  fictitious  play  It  is  further  restricted  in  that  it 
considers  only  2x2  games,  and  then  only  2x2  games  in  which  the  unaugmented 
game  has  a  single  equilibrium  that  is  completely  mixed,  and  then  only  those  for 
which  the  augmentation  satisfies  Assumption  8.3. 

Thinking  first  about  the  behavior  and  assessment  rules,  it  is  clear  that  exten- 
sions are  possible.  (Indeed,  extensions  along  these  lines  are  suggested  by  Arthur 
et  al.  (1987).)  Assessments  can  be  asymptotically  empirical  and  behavior  asymp- 
totically myopic,  as  long  as  the  "rates  of  convergence"  to  fictitious  play  and  my- 
opic behavior  are  sufficiently  fast.  All  we  need  are  the  bounds  on  E[LM  -L,\£t] , 
which  involves  the  two  possible  values  of  a'M  —  a\  (two  for  i  =  1  and  two 
more  for  i  =  2)  and  the  probabilities  of  those  values;  we  can  tolerate  changes 
that,  in  terms  of  these  differences  and  probabilities,  contribute  differences  that 
are  uniformly  0(1  /t1)  or  smaller.  One  can  control  the  probabilities  by  imposing 
a  rate  of  convergence  test  on  the  sequence  {e<}  that  governs  the  asymptotic  part 
of  asymptotically  myopic  behavior.  But  for  the  differences  q|+1  —  a\ ,  a  bit  more 
delicacy  is  called  for.  One's  first  instinct  might  be  to  impose  a  uniform  rate  of 
convergence  to  empirical  assessments;  the  natural  condition  would  seem  to  be 
that    |q'(  -  q~'(G)|   is  at  most   0(1 /i2).29     But  this  is  stronger  than  is  needed, 


9  Here  q_,(Ci)  is  the  fraction  of  the  time  that  -i  has  played  his  first  strategy. 
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since  this  controls  deviations  of  u!  from  empirical  frequencies;  we  don't  need 
to  know  that  a\  is  "close"  to  empirical  frequencies,  but  only  that  if  a\  is  fairly 
far  from  empirical  frequencies,  aJ+]  is  going  to  be  about  the  same  distance  away 
(from  the  new  empirical  frequency)  in  the  same  direction.  In  this  regard,  note 
that  for  fictitious  play  beliefs  with  nonzero  initial  weight  vectors,  |ai  -  o_1(G)| 
is  0{\/t).  If  we  insisted  that  \a)  -  a~''((,)\  is  0(1/t2)  or  smaller,  we  would  rule 
out  these  assessment  rules,  for  which  the  result  does  hold. 

Extending  our  results  to  other  classes  of  2  x  2  games  or  beyond  2x2  games 
(and  to  games  with  more  than  two  players)  seem  to  offer  greater  challenges.  We 
conjecture  that  results  on  local  stability  can  be  derived,  along  the  following  line. 
Take  an  augmented  game  and  any  equilibrium  distribution  for  that  game.  Write 
down  the  continuous-time  dyanmics  for  the  expected  values  of  the  empirical  fre- 
quencies, as  in  equations  (*)  above.30  If  this  system  is  locally  stale  at  the  equi- 
librium values  by  the  usual  eigenvalue  test,  then  the  equilibrium  will  be  locally 
stable  in  the  sense  of  this  paper  for  fictitious-play  learning  dynamics.  (Compare 
with  Arthur  et  al.  [1987].)  It  is  interesting  to  speculate  whether  the  continuous- 
time  system  that  goes  with  the  Shapley  (1964)  counterexample  is  unstable  at  the 
equilibrium.  If  so,  then  its  instability  under  fictitious  play  (for  augmentations) 
might  follow. 

Regardless  of  these  conjectures,  we  hope  that  the  limited  results  we  have 
managed  to  derive  here  indicate  that,  with  Harsanyi's  notion  of  purification,  it 
is  plausible  that  in  some  cases  players  would  learn  to  play  a  mixed  strategy 
equilibrium. 


This  is  conceptually  straightforward,  but  it  is  not  a  simple  exercise  in  practice,  which  is  one 
reason  we  offer  conjectures  instead  of  results. 
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